<!--
SPDX-FileCopyrightText: 2026 Sephyi <me@sephy.io>
SPDX-License-Identifier: PolyForm-Noncommercial-1.0.0
-->
# ๐ CommitBee
[](https://github.com/sephyi/commitbee/actions/workflows/ci.yml)
[](LICENSES/PolyForm-Noncommercial-1.0.0.txt)
[](https://blog.rust-lang.org/)
[](https://api.reuse.software/info/github.com/sephyi/commitbee)
**The commit message generator that actually understands your code.**
CommitBee is a Rust-native CLI tool that uses **tree-sitter semantic analysis** and LLMs to generate high-quality [conventional commit](https://www.conventionalcommits.org/) messages. Unlike every other tool in this space, CommitBee doesn't just pipe raw `git diff` output to an LLM โ it parses both the staged and HEAD versions of your files, maps diff hunks to symbol spans (functions, classes, methods), and provides structured semantic context. This produces fundamentally better commit messages, especially for complex multi-file changes.
## โจ What Makes CommitBee Different
| Feature | CommitBee | Others |
| ------------------------------------ | --------- | --------------- |
| ๐ณ Tree-sitter semantic analysis | **Yes** | No |
| ๐ Automatic commit splitting | **Yes** | No |
| ๐ง Evidence-based type inference | **Yes** | No |
| ๐ Built-in secret scanning | **Yes** | Rarely |
| ๐ Token budget management | **Yes** | No |
| โก Streaming LLM output | **Yes** | Rarely |
| ๐ Prompt debug mode | **Yes** | No |
| ๐ Local-first (Ollama default) | **Yes** | Cloud-first |
| ๐ฆ Single static binary | **Yes** | Node.js/Python |
Every competitor sends raw diffs to LLMs. CommitBee sends **semantic context** โ which functions changed, what was added or removed, and why the change matters structurally.
### Commit splitting
When your staged changes contain logically independent work (e.g., a bugfix in one module + a refactor in another), CommitBee detects this and offers to split them into separate, well-typed commits automatically. The splitter uses diff-shape fingerprinting with Jaccard similarity clustering โ files are grouped not just by directory but by the actual shape and vocabulary of their changes.
```txt
โก Commit split suggested โ 2 logical change groups detected:
Group 1: feat(llm) [2 files]
[M] src/services/llm/anthropic.rs (+20 -5)
[M] src/services/llm/openai.rs (+8 -3)
Group 2: fix(sanitizer) [1 file]
[M] src/services/sanitizer.rs (+3 -1)
? Split into separate commits? (Y/n)
```
## ๐ฆ Installation
### From source
```bash
cargo install commitbee
```
### Build from repository
```bash
git clone https://github.com/sephyi/commitbee.git
cd commitbee
cargo build --release
```
The binary will be at `./target/release/commitbee`.
### Requirements
- **Rust** 1.94+ (edition 2024)
- **Ollama** running locally (default provider) โ [Install Ollama](https://ollama.ai)
- A model pulled in Ollama (recommended: `qwen3:4b`)
```bash
ollama pull qwen3:4b
```
## ๐ Quick Start
```bash
# Stage your changes
git add src/feature.rs
# Generate and commit interactively
commitbee
# Preview without committing
commitbee --dry-run
# Auto-confirm and commit
commitbee --yes
# See what the LLM sees
commitbee --show-prompt
```
That's it. CommitBee works with zero configuration if Ollama is running locally.
## ๐ง Configuration
CommitBee stores configuration in a platform-specific directory. Create a config with:
```bash
commitbee init
```
### Example config
```toml
provider = "ollama"
model = "qwen3:4b"
ollama_host = "http://localhost:11434"
max_diff_lines = 500
max_file_lines = 100
max_context_chars = 24000
[format]
include_body = true
include_scope = true
lowercase_subject = true
```
### Environment variables
| Variable | Description | Default |
| ------------------------ | ------------------------ | -------------------------- |
| `COMMITBEE_PROVIDER` | LLM provider | `ollama` |
| `COMMITBEE_MODEL` | Model name | `qwen3:4b` |
| `COMMITBEE_OLLAMA_HOST` | Ollama server URL | `http://localhost:11434` |
| `COMMITBEE_API_KEY` | API key (cloud providers)| โ |
## ๐ Usage
```bash
commitbee [OPTIONS] [COMMAND]
```
### Options
| Flag | Description |
| ------------------ | -------------------------------------- |
| `--dry-run` | Print message only, don't commit |
| `--yes` | Auto-confirm and commit |
| `-n, --generate` | Generate N candidates (1-5, default 1) |
| `--no-split` | Disable commit split suggestions |
| `--no-scope` | Disable scope in commit messages |
| `--allow-secrets` | Allow committing with detected secrets |
| `--verbose` | Show symbol extraction details |
| `--show-prompt` | Debug: display the full LLM prompt |
### Commands
| Command | Description |
| --------------------- | -------------------------------------- |
| `init` | Create a config file |
| `config` | Show current configuration |
| `doctor` | Check configuration and connectivity |
| `completions <shell>` | Generate shell completions |
| `hook install` | Install prepare-commit-msg hook |
| `hook uninstall` | Remove prepare-commit-msg hook |
| `hook status` | Check if hook is installed |
## ๐ณ How It Works
CommitBee's pipeline goes beyond simple diff forwarding:
```txt
โโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโ
โ Stage โ โ โ Git โ โ โ Tree-sitterโ โ โ Split โ โ โ Context โ โ โ LLM โ
โ Changes โ โ Service โ โ Analyzer โ โ Detector โ โ Builder โ โProvider โ
โโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโ
โ โ โ โ โ
Staged diff Symbol spans Group files Budget-aware Commit message
+ file list (functions, by module, prompt with (conventional
classes, etc.) suggest split semantic context format)
```
1. **Git Service** โ Discovers the repo via gix, reads staged changes and diffs (NUL-delimited for path safety)
2. **Tree-sitter Analyzer** โ Parses both staged and HEAD file versions in parallel (via rayon), maps diff hunks to symbol spans (functions, structs, methods) with tri-state tracking (added/removed/modified-signature)
3. **Commit Splitter** โ Groups files using diff-shape fingerprinting + Jaccard similarity clustering, detects multi-concern changes, offers to split into separate commits
4. **Context Builder** โ Assembles a budget-aware prompt with evidence flags, constraint rules, primary change detection, and metadata-aware breaking change signals
5. **Safety Scanner** โ Checks for secrets and merge conflicts (added-line-only, with self-detection prevention) before anything leaves your machine
6. **LLM Provider** โ Streams the prompt to your chosen model and parses the response
7. **Commit Sanitizer** โ Validates the output as proper conventional commit format, handles JSON extraction from noisy LLM output (thought blocks, conversational preambles, code fences), wraps body at 72 chars
### Supported languages
| Language | Parser |
| ------------ | ------------------------ |
| Rust | `tree-sitter-rust` |
| TypeScript | `tree-sitter-typescript` |
| JavaScript | `tree-sitter-javascript` |
| Python | `tree-sitter-python` |
| Go | `tree-sitter-go` |
Files in unsupported languages are still included in the diff context โ they just don't get semantic symbol extraction.
## ๐ Security
CommitBee scans all content before it's sent to any LLM provider:
- ๐ **API key detection** โ AWS keys, OpenAI keys, generic secrets
- ๐ **Private key detection** โ PEM-encoded private keys
- ๐ **Connection string detection** โ Database URLs with credentials
- โ ๏ธ **Merge conflict detection** โ Prevents committing unresolved conflicts
The default provider (Ollama) runs entirely on your machine. No data leaves your network unless you explicitly configure a cloud provider.
## ๐๏ธ Architecture
```bash
src/
โโโ main.rs # Entry point
โโโ lib.rs # Library exports
โโโ app.rs # Application orchestrator
โโโ cli.rs # CLI arguments (clap)
โโโ config.rs # Configuration (figment layered)
โโโ error.rs # Error types (thiserror + miette)
โโโ domain/
โ โโโ change.rs # FileChange, StagedChanges, ChangeStatus
โ โโโ symbol.rs # CodeSymbol, SymbolKind
โ โโโ context.rs # PromptContext (semantic prompt assembly)
โ โโโ commit.rs # CommitType (single source of truth)
โโโ services/
โโโ git.rs # GitService (gix + git CLI, concurrent content fetching)
โโโ analyzer.rs # AnalyzerService (tree-sitter, parallel via rayon)
โโโ context.rs # ContextBuilder (token budget, evidence flags)
โโโ safety.rs # Secret scanning, conflict detection
โโโ sanitizer.rs # CommitSanitizer (JSON + plain text, BREAKING CHANGE footer)
โโโ splitter.rs # CommitSplitter (diff-shape + Jaccard clustering)
โโโ llm/
โโโ mod.rs # LlmProvider trait + enum dispatch + shared SYSTEM_PROMPT
โโโ ollama.rs # OllamaProvider (streaming NDJSON)
โโโ openai.rs # OpenAiProvider (SSE streaming)
โโโ anthropic.rs # AnthropicProvider (SSE streaming)
```
## ๐งช Testing
```bash
cargo test # All tests (178 tests)
cargo test --test sanitizer # CommitSanitizer tests
cargo test --test splitter # CommitSplitter tests
cargo test --test safety # Secret scanner tests
cargo test --test context # ContextBuilder tests
cargo test --test commit_type # CommitType tests
cargo test --test integration # LLM provider integration tests
```
The test suite includes snapshot tests ([insta](https://insta.rs/)), property-based tests ([proptest](https://proptest-rs.github.io/proptest/)), never-panic guarantees for all user-facing parsers, and integration tests using [wiremock](https://docs.rs/wiremock) for LLM provider mocking.
## ๐บ๏ธ Roadmap
| Phase | Version | Status |
| --------------------------- | ---------- | ---------------- |
| ๐ง Stability & Correctness | `v0.2.0` | โ
Complete |
| โจ Polish & Providers | `v0.2.0` | โ
Complete |
| ๐ Differentiation | `v0.3.0` | โ
Complete |
| ๐ Market Leadership | `v0.4.0+` | ๐ฎ Future |
### v0.3.0 highlights (current)
- **Diff-shape fingerprinting + Jaccard clustering** โ Splitter groups files by change shape and content vocabulary, not just directory
- **Evidence-based type inference** โ Constraint rules from code analysis drive commit type selection (bug evidence โ fix, mechanical โ style, dependency-only โ chore)
- **Robust LLM output parsing** โ Sanitizer handles `<think>`/`<thought>` blocks, conversational preambles, noisy JSON extraction
- **Metadata-aware breaking change detection** โ Detects MSRV bumps, engines.node, requires-python changes
- **Symbol tri-state tracking** โ Added/removed/modified-signature differentiation in tree-sitter analysis
- **Primary change detection** โ Identifies the single most significant change for subject anchoring
- **Post-generation validation** โ Subject specificity validator ensures concrete entity naming
- **NUL-delimited git parsing** โ Safe handling of paths with special characters
- **Parallel tree-sitter parsing** โ rayon for CPU-bound parsing, tokio JoinSet for concurrent git fetching
- **Anti-hallucination prompt engineering** โ EVIDENCE/CONSTRAINTS sections, negative examples, anti-copy rules
### v0.2.0 highlights
- **Cloud providers** โ OpenAI-compatible and Anthropic streaming support
- **Commit splitting** โ Automatic detection and splitting of multi-concern staged changes
- **Git hook integration** โ `commitbee hook install/uninstall/status`
- **Shell completions** โ bash, zsh, fish, powershell via `clap_complete`
- **Rich error diagnostics** โ `miette` for actionable error messages
- **Multiple message generation** โ `--generate N` with interactive candidate selection
- **Hierarchical config** โ `figment`-based layering (CLI > Env > File > Defaults)
- **Structured logging** โ `tracing` with `COMMITBEE_LOG` env filter
- **Doctor command** โ `commitbee doctor` for connectivity and config checks
- **Secure key storage** โ OS keychain via `keyring` (optional feature)
- **Body line wrapping** โ Commit body text wrapped at 72 characters
See [`PRD.md`](PRD.md) for the full product requirements document.
## ๐ค Contributing
Contributions are welcome! By contributing, you agree to the [Contributor License Agreement](CLA.md) โ you'll be asked to sign it when you open your first pull request.
The project uses:
- **Rust edition 2024** (MSRV 1.94)
- **Conventional commits** for all commit messages
- **REUSE/SPDX** for license compliance
```bash
# Development workflow
cargo fmt # Format code
cargo clippy -- -D warnings # Lint (must pass clean)
cargo test # Run all tests
# Manual testing
git add some-file.rs
cargo run -- --dry-run # Preview commit message
cargo run -- --show-prompt # Debug the LLM prompt
```
## ๐ Sponsor
If you find CommitBee useful, consider [sponsoring my work](https://github.com/sponsors/Sephyi).
## ๐ License
This project is licensed under [PolyForm-Noncommercial-1.0.0](LICENSES/PolyForm-Noncommercial-1.0.0.txt).
REUSE compliant โ every file carries SPDX headers.
Copyright 2026 [Sephyi](https://sephy.io)