commitbee 0.6.0

AI-powered commit message generator using tree-sitter semantic analysis and local LLMs
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
<!--
SPDX-FileCopyrightText: 2026 Sephyi <me@sephy.io>

SPDX-License-Identifier: AGPL-3.0-only OR LicenseRef-Commercial
-->

# CommitBee — Product Requirements Document

**Version**: 5.4  
**Date**: 2026-04-07  
**Status**: Active  
**Author**: [Sephyi](https://github.com/Sephyi) + Junie (LLM Agent)

## Changelog

<details>
<summary>Revision history (v3.3 → v6.0)</summary>

| Version | Date       | Summary |
| ------- | ---------- | ------- |
| 5.4     | 2026-04-07 | v0.6.0: Added FR-081 (Interactive Message Refinement). Updated test count to 442. |
| 5.3     | 2026-04-07 | v0.6.0: Added FR-076 to FR-080 (Interactive editing, optimized splitter, native clipboard, secret scan accuracy, struct/enum diff completion). Updated test count to 440. |
| 5.2     | 2026-03-28 | Security hardening: `secrecy::SecretString` for API keys (F-004), overflow-checks in release profile (F-001). Updated SR-002, DR-005, DR-006. |
| 5.1     | 2026-03-28 | fix: keyring platform-native backends, API key validation ordering for set-key command. Updated FR-019 and SR-002. |
| 5.0     | 2026-03-28 | PRD structural overhaul: removed stale §3.1 Resolved Issues (all v0.2.0), removed Dependency Status table, removed dead ORCommit references. Updated §2 competitive landscape for 2026 (added IDE-native competitors: GitHub Copilot Desktop, Cursor, Windsurf; updated star counts; refreshed feature matrix). Updated §3 codebase structure (added diff.rs, differ.rs, progress.rs). Updated PE-001/PE-002 with v0.6.0 prompt sections (STRUCTURED CHANGES, IMPORTS, RELATED FILES, INTENT). Updated PR-005 with adaptive budget. Added v0.6.0 feature section §4.6 (FR-064–FR-072). Renumbered Future to §4.7. |
| 4.4     | 2026-03-27 | Added future requirements from audit: FR-073 (move detection), FR-074 (AST-based splitting), FR-075 (configurable categorization), TR-008 (LLM quality testing), PE-007 (token-accurate budgets). |
| 4.3     | 2026-03-27 | v0.6.0 deep semantic understanding: parent scope, import detection, doc-vs-code, structural AST diffs, semantic markers (FR-071), change intent (FR-072). 424 tests. |
| 4.2     | 2026-03-22 | v0.5.0 hardening: security fixes (SSRF prevention, streaming caps), prompt optimization (budget fix, evidence omission, emoji removal), eval harness (36 fixtures, per-type reporting), test coverage (15+ new tests), API hygiene (pub(crate) demotions), 5 fuzz targets. 359 tests. |
| 4.1     | 2026-03-22 | AST context overhaul (v0.5.0): full signature extraction from tree-sitter nodes, semantic change classification (whitespace vs body vs signature), old→new signature diffs, cross-file connection detection, formatting auto-detection via symbols. 359 tests. |
| 4.0     | 2026-03-13 | PRD normalization: aligned phases with shipped versions (v0.2.0/v0.3.x/v0.4.0), collapsed revision history, unified status markers, resolved stale critical issues, canonicalized test count to 308, removed dead cross-references. FR-031 (Exclude Files) and FR-033 (Copy to Clipboard) shipped. |
| 3.3     | 2026-03-13 | v0.4.0 full feature completion — FR-030 (Custom Prompt Templates), FR-032 (Multi-Language), FR-036 (Tree-sitter Query Patterns), FR-057 (Additional Languages), FR-058 (History Learning), TR-006 (Eval Harness), TR-007 (Fuzzing). 308 tests. |
| 3.2     | 2026-03-13 | FR-035 (Rename Detection), FR-037 (Expanded Secret Scanning), FR-038 (Progress Indicators). 202 tests. |
| 3.1     | 2026-03-13 | Deep codebase audit + streaming hardening: `Provider::new()` returns `Result`, 1 MB response cap, EOF buffer parsing, zero-allocation streaming, HTTP error body propagation. 188 tests. |
| 3.0     | 2026-03-08 | v0.3.1 multi-pass retry + prompt enforcement. FR-041 expanded to 7 rules. 182 tests. |
| 2.9     | 2026-03-08 | v0.3.1 patch: default model → `qwen3.5:4b`, subject length enforcement, `think` config option. |
| 2.8     | 2026-03-08 | v0.3.0 release prep — sanitizer robustness, splitter Jaccard clustering, simplified prompts, NUL-delimited git parsing. 178 tests. |
| 2.7     | 2026-03-08 | Splitter precision + subject quality + metadata breaking detection. 169 tests. |
| 2.6     | 2026-03-08 | Message quality overhaul — FR-041, FR-034 partial, FR-023 enhanced, PE-001/PE-002 updates. 168 tests. |
| 2.5     | 2026-02-22 | PRD structural cleanup (FR placement fixes). |
| 2.4     | 2026-02-22 | Conventional Commits 1.0.0 spec compliance, symbol dedup. 133 tests. |
| 2.3     | 2026-02-22 | Version alignment — v0.2.0 shipped Phase 1+2, roadmap renumbered. |
| 2.2     | 2026-02-18 | FR-023 (commit splitting), competitive matrix update, 118 tests. |
| 2.1     | 2026-02-17 | Enhancement review integration — eval harness, fallback ladder, cancellation contract, streaming trait, golden fixtures. |

</details>

## 1. Vision

> **"The commit message generator that actually understands your code."**

CommitBee is a Rust-native CLI tool that uses tree-sitter semantic analysis and LLMs to generate high-quality conventional commit messages. Unlike every competitor in the market, CommitBee doesn't just send raw `git diff` output to an LLM — it parses both the staged and HEAD versions of files, maps diff hunks to symbol spans (functions, classes, methods), and provides structured semantic context. This architectural advantage produces fundamentally better commit messages, especially for complex multi-file changes.

### Core Principles

1. **Semantic first** — AST analysis is the moat, not an afterthought
2. **Local first** — Ollama default, cloud providers opt-in, secrets never leave the machine unless explicitly configured
3. **Correct first** — No panics, no silent failures, no half-working features
4. **Fast startup** — Sub-200ms to first output, streaming for LLM responses
5. **Graceful degradation** — Works without tree-sitter, without a network, in CI, in git hooks, piped to files
6. **Zero surprise** — Explicit over implicit; debug mode (`--show-prompt`) for full transparency

### Compatibility Policy

| Release | Scope | Breaking Changes |
| ------- | ----- | ---------------- |
| v0.2.0  | Stability + polish + providers (Phase 1) | None — config format preserved, no breaking CLI changes |
| v0.3.0  | Differentiation core (splitter enhancements, validation, heuristics) | None |
| v0.3.1  | Patch — default model → `qwen3.5:4b`, subject length validation, `think` config | None |
| v0.4.0  | Feature completion (templates, languages, rename detection, history learning) | None |
| v0.5.0  | AST context overhaul (signatures, semantic classification, cross-file connections) | None |
| v0.6.0 | Deep semantic understanding (parent scope, imports, doc-vs-code, structural AST diffs, native clipboard, interactive edit) | None |

## 2. Competitive Landscape

### 2.1 Market Position

| Category | Key Players | CommitBee Advantage |
| --- | --- | --- |
| AI commit generators | opencommit (7.2K stars), aicommits (8.7K stars), aicommit2 | **Only tool with tree-sitter semantic analysis + commit splitting** |
| Rust commit tools | cocogitto (1K stars), convco, rusty-commit | Semantic analysis + AI generation (cocogitto has no AI, convco has no AI) |
| IDE-integrated | GitHub Copilot Desktop, Cursor, Windsurf | CLI-first, provider-agnostic, privacy-respecting, deeper analysis |

### 2.2 Unique Differentiators (No Competitor Has These)

1. **Tree-sitter semantic analysis** with structural AST diffs — every competitor sends raw diffs to LLMs
2. **Commit splitting** with semantic grouping — detects multi-concern changes, groups by code relationships
3. **Built-in secret scanning** — 25 patterns, no external tool dependency
4. **Evidence-based validation** — validates LLM output against computed evidence, retries on constraint violations
5. **Token budget management** with adaptive truncation and structured changes prioritization
6. **Streaming output** with cancellation — most competitors wait for complete response
7. **Prompt debug mode** (`--show-prompt`) — full transparency, no competitor offers this

### 2.3 Feature Status vs. Market Expectations

| Feature                                            | Market Expectation            | Status          |
| -------------------------------------------------- | ----------------------------- | --------------- |
| Cloud LLM providers (OpenAI, Anthropic)            | Universal                     | ✅ v0.2.0       |
| Git hook integration                               | Universal                     | ✅ v0.2.0       |
| Shell completions                                  | Expected for CLI tools        | ✅ v0.2.0       |
| Multiple message generation (pick from N)          | Common (aicommits, aicommit2) | ✅ v0.2.0       |
| Commit splitting (multi-concern detection)         | No competitor has this        | ✅ v0.2.0       |
| Custom prompt/instruction files                    | Growing (Copilot, aicommit2)  | ✅ v0.4.0       |
| Unit/integration tests                             | Non-negotiable for quality    | ✅ 442 tests    |

## 3. Architecture

### 3.1 Symbol Extraction Fallback Ladder

When building the LLM prompt, symbol context uses a tiered approach:

1. **AST mapping** — Tree-sitter parses both HEAD and staged versions, maps diff hunks to symbol spans, extracts full signatures, runs structural AST diffs (best quality)
2. **Hunk heuristic** — If tree-sitter grammar unavailable, extract nearest function/class from hunk header (`@@ ... @@ fn name`)
3. **File summary** — If hunk heuristic fails, include file-level summary (path, change status, line counts)
4. **Raw diff** — Final fallback, plain diff with no semantic annotation

Each tier produces progressively less useful context but ensures the pipeline never blocks on a parse failure.

### 3.2 Codebase Structure

```
commitbee
├── src/
│   ├── main.rs              # Entry point
│   ├── lib.rs               # #![forbid(unsafe_code)] + public API
│   ├── cli.rs               # clap derive with ValueEnum, subcommands
│   ├── config.rs            # figment-based hierarchical config
│   ├── error.rs             # miette diagnostics + thiserror
│   ├── app.rs               # Orchestrator
│   ├── domain/
│   │   ├── change.rs        # FileChange, StagedChanges, FileCategory
│   │   ├── symbol.rs        # CodeSymbol, SymbolKind, SpanChangeKind
│   │   ├── context.rs       # PromptContext, ChangeIntent, IntentKind
│   │   ├── diff.rs          # SymbolDiff, ChangeDetail (25 variants)
│   │   └── commit.rs        # CommitType (single source of truth)
│   ├── queries/             # Tree-sitter .scm query files (10 languages)
│   └── services/
│       ├── git.rs           # GitService (gix discovery + git CLI)
│       ├── analyzer.rs      # AnalyzerService (parallel via rayon, returns symbols + diffs)
│       ├── differ.rs        # AstDiffer (structural function/method comparison)
│       ├── context.rs       # ContextBuilder (adaptive budget, type inference, intent detection)
│       ├── safety.rs        # Secret scanning (25 patterns, pluggable engine)
│       ├── sanitizer.rs     # CommitSanitizer + CommitValidator (7 rules)
│       ├── splitter.rs      # CommitSplitter (Jaccard + fingerprinting)
│       ├── progress.rs      # Progress indicators (indicatif, TTY-aware)
│       ├── template.rs      # TemplateService (custom prompt templates)
│       ├── history.rs       # HistoryService (commit style learning)
│       └── llm/
│           ├── mod.rs       # LlmProvider trait + shared SYSTEM_PROMPT
│           ├── ollama.rs    # Ollama (NDJSON streaming)
│           ├── openai.rs    # OpenAI-compatible (SSE streaming)
│           └── anthropic.rs # Anthropic Claude (SSE streaming)
├── tests/
│   ├── snapshots/           # insta snapshot files
│   ├── fixtures/            # Eval fixtures (36 scenarios), diff samples
│   ├── helpers.rs           # Shared test helpers (make_file_change, make_staged_changes)
│   ├── context.rs           # ContextBuilder, type inference, evidence, intents, imports, correlations
│   ├── sanitizer.rs         # CommitSanitizer + CommitValidator (unit + snapshot + proptest)
│   ├── splitter.rs          # CommitSplitter grouping and merge logic
│   ├── analyzer.rs          # AnalyzerService symbol extraction
│   ├── languages.rs         # Per-language symbol, signature, parent scope, structural diff tests
│   ├── safety.rs            # Secret scanning patterns + conflict detection
│   ├── integration.rs       # LLM provider round-trips with wiremock
│   ├── history.rs           # HistoryService with tempfile git repos
│   ├── template.rs          # TemplateService custom/default templates
│   ├── commit_type.rs       # CommitType parsing and ALL sync
│   └── eval.rs              # Eval harness fixture validation (feature-gated)
├── fuzz/                    # 5 cargo-fuzz targets
└── completions/             # Generated shell completions
```

### 3.3 Trait Design for Testability

```rust
// Services defined as traits for mockability
pub trait GitOperations: Send + Sync {
    async fn get_staged_changes(&self) -> Result<StagedChanges>;
    async fn get_file_diff(&self, path: &Path) -> Result<String>;
    async fn fetch_file_contents(&self, paths: &[PathBuf]) -> (HashMap<PathBuf, String>, HashMap<PathBuf, String>);
    async fn commit(&self, message: &str) -> Result<()>;
}

pub trait CodeAnalyzer: Send + Sync {
    fn extract_symbols(&self, changes: &[FileChange], staged: &HashMap<PathBuf, String>, head: &HashMap<PathBuf, String>) -> Vec<CodeSymbol>;
}

// LlmProvider with native async (no async_trait)
// Both generate() (buffered) and generate_stream() (streaming) required.
pub trait LlmProvider: Send + Sync {
    async fn generate(&self, system: &str, user: &str, cancel: CancellationToken) -> Result<String>;
    async fn generate_stream(
        &self,
        system: &str,
        user: &str,
        cancel: CancellationToken,
    ) -> Result<Pin<Box<dyn Stream<Item = Result<String>> + Send>>>;
    fn name(&self) -> &str;
    fn supports_streaming(&self) -> bool;
}

// App takes trait objects for testability
pub struct App {
    git: Box<dyn GitOperations>,
    analyzer: Box<dyn CodeAnalyzer>,
    llm: Box<dyn LlmProvider>,
    config: Config,
}
```

`generate_stream()` is required for all providers. Providers that do not support streaming implement it by wrapping `generate()` as a single-element stream.

## 4. Feature Requirements

### 4.1 Shipped — v0.2.0 (Stability & Providers)

All features in this section shipped in v0.2.0. Included for completeness and traceability.

#### FR-001: Fix UTF-8 Panics in Sanitizer ✅

Use `str::chars().take(69).collect::<String>()` for safe truncation. Proptest guarantees sanitizer never panics on arbitrary Unicode input.

#### FR-002: Include Symbols in LLM Prompt ✅

`to_prompt()` includes a "Symbols changed" section using the fallback ladder (AST mapping → hunk heuristic → file summary → raw diff). Graceful degradation when tree-sitter parsing fails.

#### FR-003: Unit Test Suite ✅

Snapshot tests (insta) for sanitizer, diff parser, safety scanner, context builder, and file categorizer. Proptest for never-panic guarantees.

#### FR-004: Remove Unused Dependencies ✅

Removed `anyhow`, replaced `once_cell` with `std::sync::LazyLock`, replaced `async-trait` with native async traits, replaced `futures` with `tokio-stream`.

#### FR-005: Fix Dead Code ✅

All dead fields either implemented (rename detection, signature display) or removed. No compiler warnings.

#### FR-006: Reduce Tokio Features ✅

Features reduced to `["rt-multi-thread", "macros", "signal", "sync", "process"]`.

#### FR-007: CommitType Single Source of Truth ✅

`CommitType` provides `const ALL: &[&str]` used by sanitizer and validation. Compile-time test ensures sync.

#### FR-010: Rich Diagnostic Errors (miette) ✅

Every error variant has a human-readable message, error code, help suggestion, and source context where applicable.

#### FR-011: OpenAI-Compatible Provider ✅

Supports any OpenAI-compatible API (OpenAI, Groq, Together, LM Studio, vLLM). Configurable `api_base_url`, `model`, `api_key`. Streaming with cancellation. Tested with wiremock.

#### FR-012: Anthropic Provider ✅

Native Anthropic Claude API support. Streaming via `generate_stream()` with cancellation. Tested with wiremock.

#### FR-013: Ollama Hardening ✅

Configurable timeout (default 300s), connection/model error differentiation with help text, configurable `temperature`/`num_predict`, health check, mid-stream error handling, streaming support.

#### FR-014: Git Hook Integration ✅

`commitbee hook install/uninstall/status`. Non-destructive (backs up existing hooks). Detects and skips merge/amend/squash commits. Atomic writes. Graceful fallback if binary not found.

#### FR-015: Shell Completions ✅

`commitbee completions <shell>` for bash, zsh, fish, powershell via `clap_complete`.

#### FR-016: Multiple Message Generation ✅

`commitbee --generate N` with `dialoguer` interactive selection in TTY mode. Non-TTY outputs all N. `--yes` auto-selects first.

#### FR-017: Hierarchical Configuration (figment) ✅

Priority: CLI args > env vars > project config (`.commitbee.toml`) > user config > defaults.

| Platform | User Config Path |
| -------- | --------------- |
| macOS    | `~/Library/Application Support/commitbee/config.toml` |
| Linux    | `~/.config/commitbee/config.toml` (XDG) |
| Windows  | `%APPDATA%\commitbee\config.toml` |

Fallback: `~/.config/commitbee/config.toml` on all platforms for backward compatibility.

#### FR-018: Structured Logging (tracing) ✅

`RUST_LOG=commitbee=debug` for verbose output. `--verbose` / `-v` flag. Key functions instrumented with `#[instrument]`.

#### FR-019: Secure API Key Storage ✅

System keychain via `keyring` with platform-native backends (`apple-native` on macOS, `windows-native` on Windows, `linux-native` on Linux). Feature-gated. `commitbee config set-key/get-key <provider>`. Env var fallback. Never stores keys in plaintext config. Commands that don't need the LLM (`set-key`, `get-key`, `init`, `config`, `completions`, `hook`) skip API key validation.

#### FR-020: Async Git Operations ✅

All git CLI calls use `tokio::process::Command`. Event loop never blocked.

#### FR-021: Single-Pass Diff Parsing ✅

One `git diff --cached --no-ext-diff --unified=3` call parsed per-file. NUL-delimited name-status parsing (`-z` flag).

#### FR-022: Integration Test Suite ✅

End-to-end tests with `tempfile` git repos and `wiremock` LLM mocks. CLI tests with `assert_cmd`/`insta-cmd`.

#### FR-023: Commit Splitting ✅

Detects logically independent staged changes and splits into separate well-typed commits.

**Implementation details:**
1. Diff-shape fingerprinting + Jaccard clustering (vocabulary overlap > 0.4)
2. Symbol dependency merging via targeted caller detection
3. Category separation (docs/config get own groups)
4. Module detection with 22-entry generic directory exclusion list
5. Post-clustering sub-split for >6-file groups spanning multiple modules
6. Scored support file assignment (known pairs, stem overlap, standalone fallback)
7. Type+scope inference per group
8. Group rationale in per-group prompts (`GROUP_REASON:`)
9. Focus instruction for >5-file groups
10. Collapse check (same type+scope → suggest single commit)
11. Split execution: unstage all → stage group → commit → repeat

**Safety**: Refuses to split when staged files also have unstaged modifications.
**CLI**: `--no-split` disables. `--yes` and non-TTY skip suggestion (default single commit).
**Tests**: 16 dedicated integration tests.

#### FR-039: Config Validation ✅

`commitbee config check` validates configuration. `commitbee doctor` checks Ollama health. URL parsing, numeric bounds, provider enum validation at config time.

### 4.2 Shipped — v0.3.x (Differentiation Core)

Features that shipped incrementally across v0.3.0 and v0.3.1.

#### FR-034: Improved Commit Type Heuristics ✅

Evidence-based deterministic commit type inference:

- Test-only → `test`, doc-only → `docs`, CI-only → `ci`, dependency-only → `chore`
- New files with substantial code → `feat`
- `fix` requires `has_bug_evidence` (bug-fix comments in diff); without → `refactor`
- API replacement detection (public APIs added AND removed → `refactor`)
- Mechanical/formatting detection → `style`/`refactor` (never `feat`/`fix`)
- Metadata-aware breaking detection: `rust-version`, `engines.node`, `requires-python` changes; removed `pub use`/`pub mod`/`export`
- Symbol tri-state: `AddedOnly`, `RemovedOnly`, `ModifiedSignature` — public modified symbols contribute to breaking risk
- Default fallback: `Refactor` (safer than `Feat` for ambiguous changes)

#### FR-040: Conventional Commits 1.0.0 Spec Anchoring ✅

- Breaking changes: `!` suffix + `BREAKING CHANGE:` footer (always emitted, regardless of `include_body`)
- Footer wrapped at 72 chars with 2-space continuation indent (git-trailer compatible)
- Single shared `SYSTEM_PROMPT` constant; type list synced with `CommitType::ALL` via compile-time test
- Sanitizer normalizes `"null"` → non-breaking
- Symbol deduplication in context builder
- Cross-project file categorization: 30+ source language extensions, 40+ config patterns, dotfile auto-detection, expanded CI/build/lock file detection
- Expanded scope inference: additional source/monorepo dirs, generic next-component exclusion

#### FR-041: Post-Generation Validation ✅

Evidence-based LLM output validation with multi-pass corrective retry (up to 3 attempts).

**Evidence flags** (computed before LLM generation):
- `is_mechanical`, `has_bug_evidence`, `public_api_removed_count`, `has_new_public_api`, `is_dependency_only`

**CommitValidator — 7 rules:**
1. `fix` requires `has_bug_evidence` (otherwise → `refactor`)
2. `breaking_change` required when public APIs removed
3. `breaking_change` must not copy internal field names (anti-hallucination)
4. Mechanical transforms cannot be `feat`/`fix` (→ `style`/`refactor`)
5. Dependency-only changes must be `chore`
6. Subject specificity: generic verb+noun triggers retry with instruction to name specific APIs/modules
7. Subject length: rejects subjects exceeding 72-char first line, reports char budget

**Retry behavior** (v0.3.1): Appends `CORRECTIONS` section, re-prompts, re-validates. Sanitizer rejects overlong first lines with descriptive error (no silent truncation).

### 4.3 Shipped — v0.4.0 (Feature Completion)

#### FR-030: Custom Prompt Templates ✅

`TemplateService` in `src/services/template.rs`. Config fields: `system_prompt_path`, `template_path`. Template variables: `{{diff}}`, `{{symbols}}`, `{{files}}`, `{{type}}`, `{{scope}}`. Default templates used when no custom template specified. All LLM providers pass through custom system prompt. 7 tests.

#### FR-032: Multi-Language Commit Messages ✅

`--locale <lang>` flag and `locale` config option. `LANGUAGE:` instruction injected into prompt context. ISO 639-1 codes supported. Type/scope remain in English per conventional commits spec.

#### FR-035: Rename Detection ✅

`--find-renames=N%` with configurable `rename_threshold` (default 70%, 0 disables). NUL-delimited `R<NNN>` status parsing consuming two path fields. `ChangeStatus::Renamed` variant with `old_path` on `FileChange`. Context builder formats as `old → new (N% similar)`. Split suggestions show `[R]` marker.

#### FR-036: Tree-sitter Query Patterns ✅

`.scm` query files in `src/queries/` per language with `@name`/`@definition` captures. `LanguageConfig` with `query_source` field. `tree_sitter::Query` + `QueryCursor` + `StreamingIterator` replaces manual `TreeCursor` walking.

#### FR-037: Expanded Secret Scanning ✅

25 built-in `SecretPattern` structs across 13 categories:

| Category | Patterns |
| -------- | -------- |
| Cloud | AWS access/secret, GCP service account/API key, Azure storage |
| AI/ML | OpenAI (`sk-proj-`), Anthropic, HuggingFace |
| Source Control | GitHub PAT/fine-grained/OAuth, GitLab |
| Communication | Slack token/webhook, Discord webhook |
| Payment | Stripe, Twilio, SendGrid, Mailgun |
| Database | Connection strings |
| Crypto | Private keys, JWT |
| Generic | API key patterns (quoted/unquoted) |

Pluggable engine via `build_patterns(custom, disabled)`. Config: `custom_secret_patterns`, `disabled_secret_patterns`. `LazyLock` default pattern set.

#### FR-038: Progress Indicators ✅

`Progress` struct wrapping `Option<ProgressBar>` with TTY detection (`std::io::stderr().is_terminal()`). Methods: `phase()`, `info()`, `warning()`, `finish()`. `Drop` auto-clears. Non-TTY falls back to `eprintln!` with `console::style()`.

#### FR-057: Additional Language Support ✅

5 new language crates as optional dependencies: `tree-sitter-java`, `tree-sitter-c`, `tree-sitter-cpp`, `tree-sitter-ruby`, `tree-sitter-c-sharp`. Feature flags: `lang-java`, `lang-c`, `lang-cpp`, `lang-ruby`, `lang-csharp`, `all-languages`. Each language has `.scm` query files. Visibility detection for Java/C# public modifiers. 15 feature-gated tests.

#### FR-058: Commit History Style Learning (Experimental) ✅

`HistoryService` in `src/services/history.rs`. `analyze()` fetches last N commit subjects via `git log`, extracts type distribution, scope patterns, case style, conventional compliance ratio, sample subjects. `HistoryContext::to_prompt_section()` formats as `PROJECT STYLE` block.

Config: `learn_from_history` (default `false`), `history_sample_size` (default 50). Feature-gated behind `--experimental-history` or config flag. Does not override conventional commits structure — only influences scope naming and subject phrasing style. Deterministic sort order for equal-count entries.

#### TR-006: Evaluation Harness ✅

`commitbee eval` — runs full pipeline against fixture diffs with assertion-based validation. Feature-gated (`eval` feature). 36 fixtures in `tests/fixtures/eval/` covering all 11 commit types, AST features (signatures, connections, whitespace classification), and edge cases. Each fixture has `metadata.toml` (assertions for type, evidence flags, prompt content, connections, breaking changes), `diff.patch`, and optional `symbols.toml` (injected CodeSymbol data). `EvalSummary` reports per-type accuracy and overall score. `run_sync()` method for integration test access.

#### TR-007: Fuzzing ✅

5 `cargo-fuzz` targets: `fuzz_sanitizer`, `fuzz_safety`, `fuzz_diff_parser`, `fuzz_signature`, `fuzz_classify_span`. `fuzz/Cargo.toml` with `libfuzzer-sys`.

#### FR-031: Exclude Files ✅

`--exclude` CLI flag (repeatable) and `exclude_patterns` config option. Glob patterns via `globset` (e.g., `*.lock`, `**/*.generated.*`, `vendor/**`). Excluded files listed in output but not analyzed or included in diff context. CLI patterns additive with config patterns. Returns `NoStagedChanges` if all files excluded. 4 glob matching tests + 3 TOML tests + 3 CLI parsing tests.

#### FR-033: Copy to Clipboard ✅

`--clipboard` flag copies generated message to system clipboard and prints to stdout. Skips commit confirmation prompt. Uses platform-specific commands: `pbcopy` (macOS), `clip` (Windows), `xclip -selection clipboard` (Linux). Descriptive error if clipboard command unavailable. 3 CLI parsing tests.

### 4.5 Shipped — v0.5.0 (AST Context Overhaul)

#### FR-059: Full Signature Extraction ✅

Tree-sitter AST nodes now yield complete function/struct/trait signatures (e.g., `pub fn connect(host: &str, timeout: Duration) -> Result<Connection>`) instead of bare names. Two-strategy body detection: `child_by_field_name("body")` primary, `BODY_NODE_KINDS` constant fallback (12 node kinds across 10 languages), first-line final fallback. Multi-line signatures collapsed to single line, capped at 200 chars with UTF-8-safe truncation (`floor_char_boundary`). Token budget rebalanced to 30/70 symbol/diff when signatures present. 7 unit tests + 6 per-language integration tests.

#### FR-060: Semantic Change Classification ✅

Modified symbols (same name+kind+file in both HEAD and staged) are classified as whitespace-only or semantic via character-stream comparison of non-whitespace content within symbol spans. Dual old-file/new-file line tracking for correct span attribution. Old → new signature diffs displayed in prompt (`[~] old_sig → new_sig`). Whitespace-only symbols filtered from modified display. Formatting-only changes auto-detected as `CommitType::Style` when all modified symbols are whitespace-only. `build()` restructured to classify before `infer_commit_type`. 3 tests.

#### FR-061: Cross-File Connection Detection ✅

Scans added diff lines for `symbol_name(` call patterns referencing symbols defined in other changed files. Connections displayed in new `CONNECTIONS:` prompt section (e.g., `validator calls parse() — both changed`). Capped at 5 connections to prevent prompt bloat. SYSTEM_PROMPT updated with connection-aware guidance. 1 test + 1 splitter integration test.

#### FR-062: Security Hardening ✅

Project-level `.commitbee.toml` can no longer override `openai_base_url`, `anthropic_base_url`, or `ollama_host` (SSRF/exfiltration prevention). All 3 streaming LLM providers cap `line_buffer` at `MAX_RESPONSE_BYTES` (1 MB) to prevent unbounded memory growth. `reqwest::Error` display stripped of URLs via `without_url()`. OpenAI secret pattern broadened to `sk-proj-` and `sk-svcacct-` prefixes. `Box::leak` replaced with `Cow<'static, str>` for custom secret pattern names.

#### FR-063: Prompt Optimization for Small Models ✅

Subject character budget accounts for `!` suffix on breaking changes. EVIDENCE section omitted when all flags are default (~200 chars saved). Symbol marker legend added to SYSTEM_PROMPT (`[+] added, [-] removed, [~] modified`). Duplicate JSON schema removed from system prompt. Emoji replaced with text labels (`WARNING:` instead of `⚠`). CONNECTIONS instruction softened for small models. Python tree-sitter queries enhanced with `decorated_definition` support.

### 4.6 Shipped — v0.6.0 (Deep Semantic Understanding)

#### FR-064: Parent Scope Extraction ✅

Tree-sitter AST walker extracts enclosing `impl`/`class`/`trait` scope for methods, displaying `Parent > signature` format in symbol output. Walks through intermediate nodes (`declaration_list`, `class_body`). Verified across 7 languages (Rust, Python, TypeScript, Java, Go, Ruby, C#). 10 per-language tests.

#### FR-065: Import Change Detection ✅

`detect_import_changes()` scans diff lines for added/removed import statements, producing an `IMPORTS CHANGED:` prompt section with file stem and action. Supports Rust `use`, JS/TS `import`, Python `from`/`import`, Node `require()`, C/C++ `#include`. Capped at 10 entries. 5 tests.

#### FR-066: Doc-vs-Code Change Classification ✅

`SpanChangeKind` enum (`Unchanged`, `WhitespaceOnly`, `DocsOnly`, `Mixed`, `Semantic`) replaces binary `is_whitespace_only` for richer modified-symbol classification. `classify_span_change_rich()` detects comment-line prefixes (`///`, `//!`, `#`, `"""`, `/**`). Doc-only modifications suggest `docs` type. Modified symbols show `[docs only]` or `[docs + code]` suffix in prompt output. 7 tests.

#### FR-067: Test-to-Code Ratio Inference ✅

In `infer_commit_type`, when >80% of additions are in `FileCategory::Test` files, returns `CommitType::Test` even with source files present. Uses cross-multiplication (`test * 100 > total * 80`) to avoid integer truncation. 2 tests.

#### FR-068: Test File Correlation ✅

`detect_test_correlation()` matches staged source files to test files by file stem, producing a `RELATED FILES:` prompt section (e.g., `src/services/context.rs <-> tests/context.rs (test file)`). Capped at 5 entries. 4 tests.

#### FR-069: Structural AST Diffs ✅

`AstDiffer` in `src/services/differ.rs` compares old and new tree-sitter AST nodes for modified symbols, producing `SymbolDiff` with `Vec<ChangeDetail>` (15-variant enum: `ParamAdded`, `ParamRemoved`, `ParamTypeChanged`, `ReturnTypeChanged`, `VisibilityChanged`, `AttributeAdded`/`Removed`, `AsyncChanged`, `GenericChanged`, `BodyModified`, `BodyUnchanged`, `FieldAdded`/`Removed`/`TypeChanged`). Runs inside `extract_for_file()` while both Trees are alive (Node lifetime constraint). `extract_symbols()` returns `(Vec<CodeSymbol>, Vec<SymbolDiff>)`. Struct/enum field diffing stubbed for future. Whitespace-aware body comparison via character-stream stripping. 7 unit tests + 6 per-language integration tests.

#### FR-070: Structured Changes Prompt Section ✅

`STRUCTURED CHANGES:` section in LLM prompt renders `SymbolDiff::format_oneline()` descriptions (e.g., `CommitValidator::validate(): +param strict: bool, return bool → Result<()>, body modified (+5 -2)`). Omitted when no structural diffs exist. Token budget rebalanced: symbol budget reduced from 30% to 20% when structural diffs available, freeing space for raw diff. SYSTEM_PROMPT updated to guide LLM to prefer structured changes for signature details. 3 tests.

#### FR-071: Semantic Marker Detection ✅

`AstDiffer` extended with 10 marker variants in `ChangeDetail`: `UnsafeAdded`/`Removed`, `DeriveAdded`/`Removed`, `DecoratorAdded`/`Removed`, `ExportAdded`/`Removed`, `MutabilityChanged`, `GenericConstraintChanged`. Extracts unsafe keyword, derive attributes, and mutability from tree-sitter nodes during function comparison. Unsafe additions set `has_unsafe_addition` evidence flag and trigger a CONSTRAINTS rule requiring safety justification in the commit body. 4 unit tests.

#### FR-072: Change Intent Detection ✅

`detect_intents()` scans added diff lines for error handling patterns (9 patterns including `Result<>`, `?`, `Err()`, `.map_err()`), test patterns (6 patterns including `#[test]`, `assert!`), logging patterns (9 patterns including `tracing::`, `debug!()`, `info!()`), and dependency updates (version changes in manifests). `INTENT:` prompt section shows detected patterns with confidence scores. `refine_type_with_intents()` conservatively overrides base type only for high-confidence performance optimization. 7 tests.

#### FR-076: Interactive Commit Editing ✅

Added an "Edit" choice to the candidate selection and confirmation menu. Users can refine the generated message using their system editor (via the `EDITOR` env var) before committing, allowing for final manual tweaks without leaving the tool's flow.

#### FR-077: Optimized Symbol Dependency Merging ✅

Improved `CommitSplitter` performance for large commits by pre-indexing symbols and optimizing diff scanning. Reduces complexity from $O(F \times S \times L)$ to $O(F \times L)$, where F is files, S is symbols, and L is diff lines. Ensures commit splitting remains fast even for large refactorings.

#### FR-078: Native Clipboard Implementation ✅

Replaced external command dependencies (`pbcopy`, `xclip`) with the `arboard` crate for a native, cross-platform clipboard implementation. (Improvement over `FR-033`).

#### FR-079: Accurate Secret Scan Line Numbers ✅

The secret scanner now parses `@@` hunk headers to correctly report the source line number of detected secrets in the staged file, instead of the absolute diff line number. (Improvement over `SR-001`).

#### FR-080: Full AST Diffs for Structs and Enums ✅

Extended `AstDiffer` to support structured diffing for structs, enums, classes, and traits, detecting added/removed fields and variants. (Completion of `FR-069`).

#### FR-081: Interactive Message Refinement ✅

Added a "Refine" option to the candidate selection and confirmation menu. Users can provide feedback to the LLM (e.g., "more detail about the API change") to regenerate the message with natural language guidance.

### 4.7 Future — v0.7.0+ (Market Leadership)

#### FR-050: MCP Server Mode

Run commitbee as an MCP server for editor integration (VS Code, Cursor, Claude Code). Emerging standard; forward-looking integration.

#### FR-051: Changelog Generation

Generate changelogs from commit history using semantic understanding. Natural extension of commit structure. Competes with git-cliff/cocogitto.

#### FR-052: Multi-Provider Concurrent Generation

Query multiple LLMs simultaneously, let user pick best result. Leverages multi-provider support.

#### FR-053: Interactive Regeneration with Feedback

User can say "make it shorter" / "focus on the API change" after seeing a generated message. Turns one-shot generation into a conversation.

#### FR-054: Monorepo Support

Detect monorepo structure, scope commits to affected packages. Required for enterprise adoption.

#### FR-055: Version Bumping

Automatic semantic version bumps based on commit types. Natural extension of conventional commits.

#### FR-056: GitHub Action

Run commitbee in CI to validate or rewrite commit messages. Key differentiator for team adoption.

#### FR-073: Function Move Detection

Detect when a function is moved between files or within a file with zero semantic changes, using AST structural fingerprinting (hash tree topology ignoring identifiers). Classify as `refactor` rather than add+delete. Significantly improves commit type accuracy for common refactoring patterns.

#### FR-074: AST-Based Dependency Analysis for Splitting

Replace hardcoded path heuristics (`GENERIC_DIRS`, `KNOWN_PAIRS`) in the commit splitter with actual code dependency analysis derived from AST imports and call patterns. Produces higher-quality split groups based on real code relationships rather than file proximity.

#### FR-075: Configurable File Categorization

Allow users to define custom file category patterns in config (e.g., `[categorization] build_patterns = ["Tiltfile", "*.bazel"]`, `source_extensions = ["rs", "ts", "custom_lang"]`). Currently all patterns are hardcoded in `FileCategory::from_path()`. Enables support for proprietary build systems and custom file types.

## 5. Security Requirements

### SR-001: Secret Scanning

- Scan all content sent to LLM, not just `+` diff lines
- 25 built-in patterns across 13 categories (see FR-037)
- Configurable pattern allowlist/blocklist
- Never send detected secrets to any LLM provider
- **Proxy/forwarding protection**: Resolve `ollama_host` to IP, verify loopback (`127.0.0.0/8` or `::1`). Reject non-loopback even if hostname looks local. Log warning on ambiguous resolution.

### SR-002: API Key Management

- **In-memory protection**: API keys stored as `secrecy::SecretString` in Config and provider structs — memory zeroed on drop, `[REDACTED]` in Debug output, only exposed at HTTP header insertion via `.expose_secret()`
- System keychain via `keyring` with platform-native backends: `apple-native` (macOS Keychain), `linux-native` (Linux Secret Service), `windows-native` (Windows Credential Manager)
- Environment variable fallback
- Never stores keys in plaintext config
- Warns if config file permissions are world-readable
- CLI `--provider` flag applies before keyring/env var lookup
- Commands that don't need the LLM (`set-key`, `get-key`, `init`, `config`, `completions`, `hook`) skip API key validation

### SR-003: Command Execution Safety

- All subprocess calls via `Command::arg()` (never shell interpolation)
- `--` separator before file paths in all git commands
- LLM output validated before use as commit message
- `#![forbid(unsafe_code)]` in `lib.rs`

### SR-004: Input Validation

- All string truncation uses `char_indices()` or `.chars().take(n)` — never byte indexing
- Config values validated at load time (URL parsing, numeric bounds, enum validation)
- LLM JSON output validated against schema before use

### SR-005: Dependency Auditing

- `cargo audit` in CI
- `cargo deny` for license compliance
- Minimize dependency tree

## 6. Performance Requirements

### PR-001: Startup Time

Cold start to first output: < 200ms (excluding LLM generation). Measured with `hyperfine` in CI. Lazy initialization for tracing-subscriber and tree-sitter grammars.

### PR-002: Git Operations

Single `git diff --cached` call parsed per-file. Async process spawning. Target: 100 staged files in < 2s.

### PR-003: Tree-sitter Parsing

Parallel via rayon (one `Parser` per file per thread). Skip files > 100KB. Cancellation via `parser.set_cancellation_flag()`. Lazy grammar loading. Language detection: file extension primary, shebang fallback. Graceful skip for unrecognized languages.

### PR-004: LLM Generation

Streaming output. Configurable timeout (default 300s). Ctrl+C cancellation with clean cleanup. Health check before generation.

### PR-005: Memory

- Token budget: characters (no tokenizer dependency), `max_context_chars` configurable (default 24K)
- Adaptive budget split: 20% symbols when structural diffs available, 30% with signatures only, 20% base
- Truncation priority (highest preserved first): structured changes > symbols > file list > diff hunks
- Parse trees dropped after symbol extraction
- Streaming buffer bounded: `MAX_RESPONSE_BYTES` = 1 MB (all providers)

### PR-006: Binary Size

Feature-gated language support. `[profile.release]` with `lto = true`, `strip = true`, `codegen-units = 1`. Target: < 15MB with default features.

### PR-007: Cancellation Contract

Ctrl+C at any pipeline point → no partial commit, no leftover temp files. Git commit only after user confirms complete message. Temp files cleaned via RAII.

## 7. UX Requirements

### UX-001: Error Messages

Every error includes **what** went wrong, **why** it happened, and **how** to fix it:

```
x Cannot connect to Ollama at http://localhost:11434

  help: Is Ollama running? Start it with:
        ollama serve
```

```
x No staged changes found

  help: Stage your changes first:
        git add <files>
```

### UX-002: Terminal Output

- Respect `NO_COLOR`
- Spinner during analysis/generation (suppressed non-TTY)
- Streaming LLM output in real-time
- Phase indicators: "Analyzing → Generating → Done"
- ASCII fallback for limited terminals

### UX-003: Non-Interactive Mode

- `--yes` auto-confirms
- Non-TTY detection for hooks/CI
- All output to stderr except commit message (for piping)
- Exit codes: 0 success, 1 error, 2 usage error, 130 interrupted

### UX-004: CLI Design

```
commitbee [OPTIONS]                    # Generate and commit (default)
commitbee --dry-run                    # Generate, print, don't commit
commitbee --yes                        # Generate and auto-commit
commitbee --generate N                 # Generate N options
commitbee --show-prompt                # Debug: show LLM prompt
commitbee --verbose / -v               # Verbose output
commitbee --no-split                   # Disable commit split suggestions
commitbee --no-scope                   # Disable scope in commit messages
commitbee --clipboard                  # Copy to clipboard
commitbee --locale <lang>              # Commit message language (ISO 639-1)
commitbee --find-renames=N%            # Rename detection threshold
commitbee --experimental-history       # Enable commit history style learning

commitbee init                         # Create config file
commitbee config                       # Show configuration
commitbee config check                 # Validate configuration
commitbee config set-key <provider>    # Store API key in keychain
commitbee doctor                       # Check Ollama connectivity + model

commitbee hook install                 # Install git hook
commitbee hook uninstall               # Remove git hook
commitbee hook status                  # Check hook status

commitbee completions <shell>          # Generate shell completions
commitbee eval                         # Run evaluation harness (dev, feature-gated)
```

### UX-005: First-Run Experience

- Zero config with Ollama detected
- Helpful setup guidance if no Ollama and no cloud provider
- `commitbee init` creates well-commented config with all options documented

### UX-006: Output Format Contracts

| Flag | stdout | stderr | Behavior |
| ---- | ------ | ------ | -------- |
| `--dry-run` | Commit message (single line) | Spinners, diagnostics | Exit 0 |
| `--generate N` (TTY) | Selected message | N numbered options + `dialoguer` prompt | `--yes` selects first |
| `--generate N` (non-TTY) | All N messages, blank-line separated | Diagnostics | — |
| `--show-prompt` | — | Full LLM prompt (keys redacted) | Does not call LLM. Exit 0 |
| Default (interactive) | Commit hash on confirm | Message + confirm/edit/cancel prompt | — |

## 8. Testing Requirements

**Current test count: 440**

### TR-001: Unit Tests

| Module | Technique | Coverage Target |
| ------ | --------- | --------------- |
| `CommitSanitizer` | Snapshot (insta) + proptest | All code paths + never-panic guarantee |
| `DiffHunk::parse_from_diff` | Snapshot | Standard diffs, renames, binary, empty |
| `safety::scan_for_secrets` | Unit + proptest | Each pattern + false positive tests |
| `ContextBuilder` | Snapshot | Budget calculation, type inference, scope inference |
| `FileCategory::from_path` | Unit | All categories, edge cases |
| `CommitType` | Unit | Verify `ALL` matches enum variants |
| `CommitValidator` | Unit | All 7 rules, boundary cases, corrections formatting |
| `TemplateService` | Unit | Custom/default templates, variable substitution |
| `HistoryService` | Unit | Style analysis, prompt section formatting |

#### Golden Semantic Fixtures

Stored in `tests/fixtures/golden/` — before/after file pairs with expected diff and symbol extraction output:

- Moved function (diff shows delete + add, symbols show single move)
- Signature change (parameter/return type modified)
- Refactor extract (new function + modified caller)
- Rename symbol (across multiple sites)
- Multi-file change (shared symbol references)

### TR-002: Integration Tests

| Scenario | Setup | Mock |
| -------- | ----- | ---- |
| Normal commit flow | tempfile git repo | wiremock Ollama |
| Empty staging area | tempfile git repo | None |
| Binary files mixed with text | tempfile git repo | wiremock Ollama |
| Large diff (truncation) | tempfile git repo | wiremock Ollama |
| Unicode file paths | tempfile git repo | wiremock Ollama |
| LLM returns invalid JSON | tempfile git repo | wiremock Ollama |
| LLM returns error mid-stream | tempfile git repo | wiremock Ollama |
| Ollama not running | None | Real connection refused |
| Secret detected | tempfile git repo | None |
| Non-TTY mode | tempfile + piped stdin | wiremock Ollama |

### TR-003: CLI Tests

Snapshot tests with `insta-cmd` for all flag combinations: `--dry-run`, `--show-prompt`, `--help`, error formatting, exit codes.

### TR-004: Property-Based Tests

```rust
proptest! {
    #[test]
    fn sanitizer_never_panics(s in "\\PC*") {
        let _ = CommitSanitizer::sanitize(&s);
    }

    #[test]
    fn secret_scanner_never_panics(s in "\\PC*") {
        let _ = scan_for_secrets(&s);
    }
}
```

### TR-005: CI Pipeline

- `cargo check` → `cargo clippy -- -D warnings` → `cargo test` → `cargo audit` → `cargo deny check`
- Triggers: push to `development`, all PRs
- Matrix: stable Rust + MSRV 1.94
- Edition 2024 (requires MSRV 1.94; let chains raise effective MSRV to 1.94)

### TR-006: Evaluation Harness ✅

`commitbee eval` — fixture-based pipeline regression testing. Feature-gated. See §4.3.

### TR-007: Fuzzing ✅

5 `cargo-fuzz` targets. See §4.3.

### TR-008: LLM Output Quality Testing

End-to-end commit message quality validation. Two modes: (1) wiremock-based deterministic testing with canned LLM responses through the full pipeline (sanitizer + validator), (2) optional live Ollama regression testing with majority-vote scoring and baseline comparison. Extends the eval harness (TR-006) from pre-LLM pipeline testing to actual output quality assurance.

## 9. Distribution Requirements

### DR-001: cargo install

`cargo install commitbee` on all tier-1 platforms. Published on crates.io.

### DR-002: Prebuilt Binaries

GitHub Releases via `cargo-dist`. Platforms: macOS ARM64/x86_64, Linux x86_64/ARM64, Windows x86_64. Shell installer, checksums, GitHub attestations.

### DR-003: Homebrew

`brew install sephyi/tap/commitbee` (generated by `cargo-dist`).

### DR-004: Shell Completions

bash, zsh, fish, powershell via `clap_complete`. Documented installation per shell in README.

### DR-005: Release Profile

```toml
[profile.release]
lto = true
strip = true
opt-level = "z"  # or "s" — benchmark both
codegen-units = 1
overflow-checks = true  # ANSSI-FR compliance
```

### DR-006: Feature Flags

Default features: `secure-storage` (system keychain via `keyring`) and `all-languages` (10 tree-sitter grammars). Build without optional features to reduce binary size or avoid platform-specific dependencies:

```bash
# Without secure storage (no keyring dependency)
cargo install commitbee --no-default-features --features all-languages

# Minimal (no keyring, specific languages only)
cargo install commitbee --no-default-features --features lang-rust,lang-python
```

`secure-storage` uses platform-native keychain backends automatically: macOS Keychain, Windows Credential Manager, Linux Secret Service.

## 10. Prompt Engineering Requirements

### PE-001: System Prompt

- Defines persona, rules, and output format
- JSON schema template with nullable fields and 2 micro few-shot examples (API replacement, style-only change) optimized for <4B parameter models
- **Concrete entity rule**: Subject must name at least one concrete entity from the diff
- Negative examples (BAD/GOOD pairs): flags vague and multi-concern subjects
- Anti-hallucination rules: "Never copy labels, field names, or evidence tags from the prompt"
- API replacement rule: added + removed public APIs → `refactor`
- Breaking change guidance: only when existing users/dependents must change code, config, or scripts
- **Structured changes guidance**: prefer STRUCTURED CHANGES for signature-level details over raw diff lines
- Single shared `SYSTEM_PROMPT` constant in `llm/mod.rs`; type list synced with `CommitType::ALL` via compile-time test

### PE-002: User Prompt

- File list with change status, semantic symbols, truncated diff
- Symbols with tri-state: "Added", "Removed", "Modified (signature changed)" with parent scope prefix
- Suggested type/scope from heuristics (hints, not requirements)
- **STRUCTURED CHANGES**: Per-symbol semantic diffs (parameter added, return type changed, visibility, async, body modification)
- **IMPORTS CHANGED**: Added/removed import statements across 6 language syntaxes
- **RELATED FILES**: Source-to-test file correlations
- **INTENT**: Detected change patterns (error handling, test, logging, dependency update) with confidence scores
- **Evidence flags**: Natural language labels (not snake_case) to prevent model copying
- **Subject budget**: Exact remaining characters after `type(scope): ` prefix
- **PRIMARY_CHANGE**: Anchors subject to most significant change (new public API > removed > largest file)
- **CONSTRAINTS**: Dynamic rules from evidence (e.g., "No bug-fix comments — do not use type fix", "Unsafe code added — mention safety justification")
- **PUBLIC API REMOVED** warning with listed symbols
- **Metadata breaking signals** (MSRV, engines.node, requires-python)
- **GROUP_REASON** per split group
- **Focus instruction** for >5-file groups

### PE-003: Multi-Stage for Large Diffs

When diff exceeds 50% of token budget: two-stage (per-file summary → commit message). Fallback: single-stage with aggressive truncation.

### PE-004: Model-Specific Tuning

- Temperature: 0.0–0.3 (configurable)
- `num_predict` / `max_tokens`: 256 default (configurable)
- Model-appropriate stop sequences
- System prompt complexity scaled to model size

### PE-005: Binary File Handling

Binary files never included as diff content. Listed in file list with change status and size delta (e.g., `+ assets/logo.png (binary, +24KB)`).

### PE-006: JSON Parse Failure Recovery

Invalid JSON → retry once with repair prompt. Second failure → heuristic extraction (type from file categories, first coherent sentence as description). Never retry more than once.

### PE-007: Token-Accurate Budget Management

Replace character-based budget estimation (~4:1 char-to-token ratio approximation) with actual BPE/tiktoken token counting for accurate LLM context window utilization. Maximizes prompt quality by filling available tokens precisely rather than under/over-estimating. Consider lightweight Rust BPE implementation or pre-computed token tables per model family.

## 11. Roadmap Summary

| Phase | Version | Status | Focus |
| ----- | ------- | ------ | ----- |
| 1 | v0.2.0 | ✅ Shipped | Stability, correctness, providers, developer experience |
| 2 | v0.3.x | ✅ Shipped | Differentiation — heuristics, validation, spec compliance |
| 3 | v0.4.0 | ✅ Shipped | Feature completion — templates, languages, rename, history, eval, fuzzing |
| 4 | v0.4.x | ✅ Shipped | Remaining polish — exclude files (FR-031), clipboard (FR-033) |
| 5 | v0.5.0 | ✅ Shipped | AST context overhaul — full signatures, semantic change classification, cross-file connections. 367 tests. |
| 6 | v0.6.0 | 📋 Active | Deep semantic understanding — parent scope, import detection, doc-vs-code classification, structural AST diffs, structured changes prompt section, semantic markers, change intent detection. 442 tests. |
| 7 | v0.7.0+ | 📋 Planned | Market leadership — MCP server, changelog, monorepo, version bumping, GitHub Action |

## 12. Success Metrics

| Metric | Target | Measurement |
| ------ | ------ | ----------- |
| Runtime panics | 0 | proptest + fuzzing, no `unwrap()` on user-facing paths |
| Test coverage | > 80% on services/ | `cargo tarpaulin` |
| CI green rate | > 99% | GitHub Actions dashboard |
| Cold startup | < 200ms | `hyperfine` in CI |
| Binary size (default features) | < 15MB | CI artifact size tracking |
| Commit message quality | > 80% "good enough" first try | Manual evaluation + `commitbee eval` |
| Secret leak rate | 0 | Integration tests with known patterns |
| MSRV | Rust 1.94 (edition 2024) | CI matrix (stable + 1.94) |
| Test count | ≥ 308 | `cargo test` (current: 440) |

## 13. Non-Goals

- **GUI/TUI** — CLI-first. Editor integration via MCP server mode.
- **General-purpose code review** — Commit messages only.
- **Git client replacement** — Wraps git for commits, doesn't replace add/push/etc.
- **WASM plugin system** — Configuration-driven extensibility first.
- **Non-git VCS** — Git covers > 95% of the market.
- **Shell snippet detection** — Commit messages never executed by git; standard sanitization sufficient.

## Appendix A: Competitive Feature Matrix

| Feature | commitbee | opencommit | aicommits | aicommit2 | cocogitto | GitHub Copilot Desktop |
| --- | --- | --- | --- | --- | --- | --- |
| **Tree-sitter AST** | ✅ | — | — | — | — | — |
| **Commit splitting** | ✅ | — | — | — | — | — |
| **Secret scanning** | ✅ | — | — | — | — | — |
| **Token budget** | ✅ | — | — | — | N/A | — |
| **Streaming** | ✅ | — | — | — | N/A | ✅ |
| **Local LLM** | ✅ | ✅ | ✅ | ✅ | N/A | — |
| **OpenAI** | ✅ | ✅ | ✅ | ✅ | N/A | N/A |
| **Anthropic** | ✅ | ✅ | — | ✅ | N/A | N/A |
| **Git hooks** | ✅ | ✅ | ✅ | — | ✅ | — |
| **Multi-generate** | ✅ | ✅ | ✅ | — | — | — |
| **Shell completions** | ✅ | — | — | — | ✅ | N/A |
| **MCP server** | Planned | — | — | — | — | N/A |
| **Changelog** | Future | — | — | — | ✅ | — |
| **Version bumping** | Future | — | — | — | ✅ | — |
| **Monorepo** | Future | — | — | — | ✅ | — |
| **IDE integration** | — | — | — | — | — | ✅ |
| **Code review** | — | — | — | ✅ | — | ✅ |

## Appendix B: Research Sources

1. **Codebase analysis** — Line-by-line review of all source files
2. **Competitor analysis** — 30+ tools across TypeScript, Rust, Python, Go
3. **Best practices** — Rust CLI patterns, LLM prompt engineering, tree-sitter techniques, security, testing, distribution