<!--
SPDX-FileCopyrightText: 2026 Sephyi <me@sephy.io>
SPDX-License-Identifier: AGPL-3.0-only OR LicenseRef-Commercial
-->
# π CommitBee   [![Build Status]][ci] [![MSRV]][rust-1.94] [![License]][license-file] [![Total Downloads]][crates-io]
[Build Status]: https://github.com/sephyi/commitbee/actions/workflows/ci.yml/badge.svg?branch=main
[ci]: https://github.com/sephyi/commitbee/actions/workflows/ci.yml
[MSRV]: https://img.shields.io/badge/MSRV-1.94-orange.svg
[rust-1.94]: https://blog.rust-lang.org/2026/03/05/Rust-1.94.0
[License]: https://img.shields.io/badge/license-AGPL--3.0-blue.svg
[license-file]: LICENSE
[Total Downloads]: https://img.shields.io/crates/d/commitbee?style=social&logo=iCloud&logoColor=black
[crates-io]: https://crates.io/crates/commitbee
**The commit message generator that actually understands your code.**
Most tools in this space pipe raw `git diff` to an LLM and hope for the best. CommitBee parses your code with [tree-sitter](https://tree-sitter.github.io/tree-sitter/), maps diff hunks to symbol spans, and gives the LLM structured semantic context β producing fundamentally better commit messages, especially for complex multi-file changes.
## β¨ What Sets CommitBee Apart
### π³ It reads your code, not just your diffs
CommitBee uses tree-sitter to parse both the staged and HEAD versions of every changed file β in parallel across CPU cores. It extracts 10 symbol types (functions, methods, structs, enums, traits, impls, classes, interfaces, constants, type aliases) with **full signatures** β the LLM sees `pub fn connect(host: &str, timeout: Duration) -> Result<Connection>`, not just "Function connect." Modified symbols show old β new signature diffs so the LLM understands exactly what changed. Cross-file relationships are detected automatically: if `validator.rs` calls `parse()` and both changed, the prompt says so. Symbols are tracked in three states: **added**, **removed**, and **modified-signature**.
Supported languages: **Rust, TypeScript, JavaScript, Python, Go, Java, C, C++, Ruby, C#** β all enabled by default, individually toggleable via Cargo feature flags. Files in other languages still get full diff context β just without symbol extraction.
### π§ It reasons about what changed
Before the LLM generates anything, CommitBee computes deterministic evidence from your code and encodes it as hard constraints in the prompt:
- **Bug-fix evidence** in the diff β `fix`. No bug evidence β the LLM can't call it a `fix`.
- **Formatting-only changes** (whitespace, import reordering) β `style`. Detected both heuristically and via per-symbol whitespace classification.
- **Dependency-only changes** β `chore`. Always.
- **Public API removed** β breaking change flagged automatically.
- **MSRV bumps, `engines.node`, `requires-python` changes** β metadata-aware breaking detection.
Commit types are driven by code analysis, not LLM guesswork. The prompt includes computed EVIDENCE flags, CONSTRAINTS the model must follow, the primary change for subject anchoring, a character budget for the subject line, and anti-hallucination rules with negative examples.
### β
It validates and corrects its own output
Every generated message passes through a 7-rule validation pipeline:
1. Fix requires evidence β no bug comments, no `fix` type
2. Breaking change detection β removed public APIs must be flagged
3. Anti-hallucination β breaking change text can't copy internal field names
4. Mechanical changes must use `style`
5. Dependency-only changes must use `chore`
6. Subject specificity β rejects generic messages like "update code" or "improve things"
7. Subject length β enforces the 72-character first line limit
If any rule fails, CommitBee appends targeted correction instructions and re-prompts the LLM β up to 3 attempts, re-validating after each. The final output goes through a sanitizer that strips thinking blocks, extracts JSON from code fences, removes conversational preambles, and wraps the body at 72 characters. You get a clean, spec-compliant conventional commit or a clear error β never a silently mangled message.
### π It splits multi-concern commits
When your staged changes mix independent work (a bugfix in one module + a refactor in another), CommitBee detects it and offers to split them into separate, well-typed commits. The splitter uses diff-shape fingerprinting combined with Jaccard similarity on content vocabulary β files are grouped by the actual shape and language of their changes, not just by directory. Symbol dependency merging keeps related files together even when their diff shapes differ: if `foo()` is removed from one file and added in another, they stay in the same commit.
```txt
β‘ Commit split suggested β 2 logical change groups detected:
Group 1: feat(llm) [2 files]
[M] src/services/llm/anthropic.rs (+20 -5)
[M] src/services/llm/openai.rs (+8 -3)
Group 2: fix(sanitizer) [1 file]
[M] src/services/sanitizer.rs (+3 -1)
? Split into separate commits? (Y/n)
```
### The pipeline
```txt
βββββββββββ ββββββββββββ ββββββββββββββ ββββββββββββ βββββββββββββ βββββββββββ
β Stage β β β Git β β β Tree-sitterβ β β Split β β β Context β β β LLM β
β Changes β β Service β β Analyzer β β Detector β β Builder β βProvider β
βββββββββββ ββββββββββββ ββββββββββββββ ββββββββββββ βββββββββββββ βββββββββββ
β β β β β
Staged diff Symbol spans Group files Budget-aware Commit message
+ file list (functions, by module, prompt with (conventional
classes, etc.) suggest split semantic context format)
```
### And there's more
- **π Local-first** β Ollama by default. Your code never leaves your machine. No API keys needed.
- **π Secret scanning** β 24 built-in patterns across 13 categories (cloud keys, AI/ML tokens, payment, database, crypto). Add custom patterns or disable built-ins via config.
- **β‘ Streaming** β Real-time token display from all 3 providers (Ollama, OpenAI, Anthropic) with Ctrl+C cancellation.
- **π Token budget** β Smart truncation that prioritizes the most important files within ~6K tokens.
- **π― Multi-candidate** β Generate up to 5 messages and pick the best one interactively.
- **πͺ Git hooks** β `prepare-commit-msg` hook with TTY detection for safe non-interactive fallback.
- **π Prompt debug** β `--show-prompt` shows exactly what the LLM sees. Full transparency.
- **π©Ί Doctor** β `commitbee doctor` checks config, connectivity, and model availability.
- **π Shell completions** β bash, zsh, fish, powershell via `commitbee completions`.
- **βοΈ 5-level config** β Defaults β project `.commitbee.toml` β user config β env vars β CLI flags.
- **π¦ Single binary** β ~18K lines of Rust. Compiles to one static binary with LTO. No runtime dependencies.
- **π§ͺ 367 tests** β Unit, snapshot, property (proptest for never-panic guarantees), and integration (wiremock).
## π¦ Installation
### From source
```bash
cargo install commitbee
```
### Build from repository
```bash
git clone https://github.com/sephyi/commitbee.git
cd commitbee
cargo build --release
```
The binary will be at `./target/release/commitbee`.
### Requirements
- **Rust** 1.94+ (edition 2024)
- **Ollama** running locally (default provider) β [Install Ollama](https://ollama.ai)
- A model pulled in Ollama (recommended: `qwen3.5:4b`)
```bash
ollama pull qwen3.5:4b
```
## π Quick Start
```bash
# Stage your changes
git add src/feature.rs
# Generate and commit interactively
commitbee
# Preview without committing
commitbee --dry-run
# Auto-confirm and commit
commitbee --yes
# See what the LLM sees
commitbee --show-prompt
```
That's it. CommitBee works with zero configuration if Ollama is running locally.
> If CommitBee saves you time, consider [**sponsoring the project**](https://github.com/sponsors/Sephyi) π
## π Documentation
- **[Full Guide](DOCS.md)** β configuration, providers, splitting, validation, troubleshooting
- **[PRD & Roadmap](PRD.md)** β product requirements and future plans
## π§ Configuration
Run `commitbee init` to create a config file. Works out of the box with zero config if Ollama is running locally.
See [Configuration](DOCS.md#-configuration) for the full config reference, environment variables, and layering priority.
## π» Usage
```bash
commitbee [OPTIONS] [COMMAND]
```
### Options
| Flag | Description |
| --- | --- |
| `--dry-run` | Print message only, don't commit |
| `--yes` | Auto-confirm and commit |
| `-n, --generate` | Generate N candidates (1-5, default 1) |
| `--no-split` | Disable commit split suggestions |
| `--no-scope` | Disable scope in commit messages |
| `--clipboard` | Copy message to clipboard instead of committing |
| `--exclude <GLOB>` | Exclude files matching glob pattern (repeatable) |
| `--allow-secrets` | Allow committing with detected secrets |
| `--verbose` | Show symbol extraction details |
| `--show-prompt` | Debug: display the full LLM prompt |
### Commands
| Command | Description |
| --- | --- |
| `init` | Create a config file |
| `config` | Show current configuration |
| `doctor` | Check configuration and connectivity |
| `completions <shell>` | Generate shell completions |
| `hook install` | Install prepare-commit-msg hook |
| `hook uninstall` | Remove prepare-commit-msg hook |
| `hook status` | Check if hook is installed |
## π Security
CommitBee scans all content before it's sent to any LLM provider with **24 built-in patterns** across 13 categories:
- βοΈ **Cloud providers** β AWS access/secret keys, GCP service accounts & API keys, Azure storage keys
- π€ **AI/ML** β OpenAI, Anthropic, HuggingFace tokens
- π§ **Source control** β GitHub (PAT, fine-grained, OAuth), GitLab tokens
- π¬ **Communication** β Slack tokens & webhooks, Discord webhooks
- π³ **Payment & SaaS** β Stripe, Twilio, SendGrid, Mailgun keys
- ποΈ **Database** β MongoDB, PostgreSQL, MySQL, Redis, AMQP connection strings
- π **Cryptographic** β PEM private keys, JWT tokens
- π **Generic** β API key assignments, quoted/unquoted secrets
- β οΈ **Merge conflict detection** β Prevents committing unresolved conflicts
Add custom patterns or disable built-ins in your config:
```toml
custom_secret_patterns = ["CUSTOM_KEY_[a-zA-Z0-9]{32}"]
disabled_secret_patterns = ["Generic Secret (unquoted)"]
```
The default provider (Ollama) runs entirely on your machine. No data leaves your network unless you explicitly configure a cloud provider.
## π§ͺ Testing
```bash
cargo test # 367 tests β unit, snapshot (insta), property (proptest), integration (wiremock)
```
See [Testing Strategy](DOCS.md#testing-strategy) for the full breakdown.
## πΊοΈ Changelog
See [`CHANGELOG.md`](CHANGELOG.md) for the full version history.
**Current:** `v0.5.0` *Beyond the Diff* β Full signature extraction, semantic change classification, cross-file connections, security hardening, and 36-fixture eval harness.
## π€ Contributing
Contributions are welcome! By contributing, you agree to the [Contributor License Agreement](CLA.md) β you'll be asked to sign it when you open your first pull request.
## π Sponsor
If you find CommitBee useful, consider [**sponsoring my work**](https://github.com/sponsors/Sephyi) β it helps keep the project going.
## π License
This project is dual-licensed under [AGPL-3.0-only](LICENSES/AGPL-3.0-only.txt) and a commercial license. See [LICENSE](LICENSE) for details.
REUSE compliant β every file carries SPDX headers.
Copyright 2026 [Sephyi](https://sephy.io)