commitbee 0.5.0

AI-powered commit message generator using tree-sitter semantic analysis and local LLMs
Documentation
<!--
SPDX-FileCopyrightText: 2026 Sephyi <me@sephy.io>

SPDX-License-Identifier: AGPL-3.0-only OR LicenseRef-Commercial
-->

# 🐝 CommitBee &emsp; [![Build Status]][ci] [![MSRV]][rust-1.94] [![License]][license-file] [![Total Downloads]][crates-io]

[Build Status]: https://github.com/sephyi/commitbee/actions/workflows/ci.yml/badge.svg?branch=main
[ci]: https://github.com/sephyi/commitbee/actions/workflows/ci.yml
[MSRV]: https://img.shields.io/badge/MSRV-1.94-orange.svg
[rust-1.94]: https://blog.rust-lang.org/2026/03/05/Rust-1.94.0
[License]: https://img.shields.io/badge/license-AGPL--3.0-blue.svg
[license-file]: LICENSE
[Total Downloads]: https://img.shields.io/crates/d/commitbee?style=social&logo=iCloud&logoColor=black
[crates-io]: https://crates.io/crates/commitbee

**The commit message generator that actually understands your code.**

Most tools in this space pipe raw `git diff` to an LLM and hope for the best. CommitBee parses your code with [tree-sitter](https://tree-sitter.github.io/tree-sitter/), maps diff hunks to symbol spans, and gives the LLM structured semantic context β€” producing fundamentally better commit messages, especially for complex multi-file changes.

## ✨ What Sets CommitBee Apart

### 🌳 It reads your code, not just your diffs

CommitBee uses tree-sitter to parse both the staged and HEAD versions of every changed file β€” in parallel across CPU cores. It extracts 10 symbol types (functions, methods, structs, enums, traits, impls, classes, interfaces, constants, type aliases) with **full signatures** β€” the LLM sees `pub fn connect(host: &str, timeout: Duration) -> Result<Connection>`, not just "Function connect." Modified symbols show old β†’ new signature diffs so the LLM understands exactly what changed. Cross-file relationships are detected automatically: if `validator.rs` calls `parse()` and both changed, the prompt says so. Symbols are tracked in three states: **added**, **removed**, and **modified-signature**.

Supported languages: **Rust, TypeScript, JavaScript, Python, Go, Java, C, C++, Ruby, C#** β€” all enabled by default, individually toggleable via Cargo feature flags. Files in other languages still get full diff context β€” just without symbol extraction.

### 🧠 It reasons about what changed

Before the LLM generates anything, CommitBee computes deterministic evidence from your code and encodes it as hard constraints in the prompt:

- **Bug-fix evidence** in the diff β†’ `fix`. No bug evidence β†’ the LLM can't call it a `fix`.
- **Formatting-only changes** (whitespace, import reordering) β†’ `style`. Detected both heuristically and via per-symbol whitespace classification.
- **Dependency-only changes** β†’ `chore`. Always.
- **Public API removed** β†’ breaking change flagged automatically.
- **MSRV bumps, `engines.node`, `requires-python` changes** β†’ metadata-aware breaking detection.

Commit types are driven by code analysis, not LLM guesswork. The prompt includes computed EVIDENCE flags, CONSTRAINTS the model must follow, the primary change for subject anchoring, a character budget for the subject line, and anti-hallucination rules with negative examples.

### βœ… It validates and corrects its own output

Every generated message passes through a 7-rule validation pipeline:

1. Fix requires evidence β€” no bug comments, no `fix` type
2. Breaking change detection β€” removed public APIs must be flagged
3. Anti-hallucination β€” breaking change text can't copy internal field names
4. Mechanical changes must use `style`
5. Dependency-only changes must use `chore`
6. Subject specificity β€” rejects generic messages like "update code" or "improve things"
7. Subject length β€” enforces the 72-character first line limit

If any rule fails, CommitBee appends targeted correction instructions and re-prompts the LLM β€” up to 3 attempts, re-validating after each. The final output goes through a sanitizer that strips thinking blocks, extracts JSON from code fences, removes conversational preambles, and wraps the body at 72 characters. You get a clean, spec-compliant conventional commit or a clear error β€” never a silently mangled message.

### πŸ”€ It splits multi-concern commits

When your staged changes mix independent work (a bugfix in one module + a refactor in another), CommitBee detects it and offers to split them into separate, well-typed commits. The splitter uses diff-shape fingerprinting combined with Jaccard similarity on content vocabulary β€” files are grouped by the actual shape and language of their changes, not just by directory. Symbol dependency merging keeps related files together even when their diff shapes differ: if `foo()` is removed from one file and added in another, they stay in the same commit.

```txt
⚑ Commit split suggested β€” 2 logical change groups detected:

  Group 1: feat(llm)  [2 files]
    [M] src/services/llm/anthropic.rs (+20 -5)
    [M] src/services/llm/openai.rs (+8 -3)

  Group 2: fix(sanitizer)  [1 file]
    [M] src/services/sanitizer.rs (+3 -1)

? Split into separate commits? (Y/n)
```

### The pipeline

```txt
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Stage  β”‚ β†’  β”‚   Git    β”‚ β†’  β”‚ Tree-sitterβ”‚ β†’  β”‚  Split   β”‚ β†’  β”‚  Context  β”‚ β†’  β”‚   LLM   β”‚
β”‚ Changes β”‚    β”‚  Service β”‚    β”‚  Analyzer  β”‚    β”‚ Detector β”‚    β”‚  Builder  β”‚    β”‚Provider β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚                β”‚                 β”‚                β”‚               β”‚
               Staged diff      Symbol spans     Group files      Budget-aware     Commit message
               + file list      (functions,      by module,       prompt with      (conventional
                                classes, etc.)   suggest split    semantic context    format)
```

### And there's more

- **🏠 Local-first** β€” Ollama by default. Your code never leaves your machine. No API keys needed.
- **πŸ”’ Secret scanning** β€” 24 built-in patterns across 13 categories (cloud keys, AI/ML tokens, payment, database, crypto). Add custom patterns or disable built-ins via config.
- **⚑ Streaming** β€” Real-time token display from all 3 providers (Ollama, OpenAI, Anthropic) with Ctrl+C cancellation.
- **πŸ“Š Token budget** β€” Smart truncation that prioritizes the most important files within ~6K tokens.
- **🎯 Multi-candidate** β€” Generate up to 5 messages and pick the best one interactively.
- **πŸͺ Git hooks** β€” `prepare-commit-msg` hook with TTY detection for safe non-interactive fallback.
- **πŸ” Prompt debug** β€” `--show-prompt` shows exactly what the LLM sees. Full transparency.
- **🩺 Doctor** β€” `commitbee doctor` checks config, connectivity, and model availability.
- **🐚 Shell completions** β€” bash, zsh, fish, powershell via `commitbee completions`.
- **βš™οΈ 5-level config** β€” Defaults β†’ project `.commitbee.toml` β†’ user config β†’ env vars β†’ CLI flags.
- **πŸ¦€ Single binary** β€” ~18K lines of Rust. Compiles to one static binary with LTO. No runtime dependencies.
- **πŸ§ͺ 367 tests** β€” Unit, snapshot, property (proptest for never-panic guarantees), and integration (wiremock).

## πŸ“¦ Installation

### From source

```bash
cargo install commitbee
```

### Build from repository

```bash
git clone https://github.com/sephyi/commitbee.git
cd commitbee
cargo build --release
```

The binary will be at `./target/release/commitbee`.

### Requirements

- **Rust** 1.94+ (edition 2024)
- **Ollama** running locally (default provider) β€” [Install Ollama](https://ollama.ai)
- A model pulled in Ollama (recommended: `qwen3.5:4b`)

```bash
ollama pull qwen3.5:4b
```

## πŸš€ Quick Start

```bash
# Stage your changes
git add src/feature.rs

# Generate and commit interactively
commitbee

# Preview without committing
commitbee --dry-run

# Auto-confirm and commit
commitbee --yes

# See what the LLM sees
commitbee --show-prompt
```

That's it. CommitBee works with zero configuration if Ollama is running locally.

> If CommitBee saves you time, consider [**sponsoring the project**](https://github.com/sponsors/Sephyi) πŸ’›

## πŸ“– Documentation

- **[Full Guide](DOCS.md)** β€” configuration, providers, splitting, validation, troubleshooting
- **[PRD & Roadmap](PRD.md)** β€” product requirements and future plans

## πŸ”§ Configuration

Run `commitbee init` to create a config file. Works out of the box with zero config if Ollama is running locally.

See [Configuration](DOCS.md#-configuration) for the full config reference, environment variables, and layering priority.

## πŸ’» Usage

```bash
commitbee [OPTIONS] [COMMAND]
```

### Options

| Flag | Description |
| --- | --- |
| `--dry-run` | Print message only, don't commit |
| `--yes` | Auto-confirm and commit |
| `-n, --generate` | Generate N candidates (1-5, default 1) |
| `--no-split` | Disable commit split suggestions |
| `--no-scope` | Disable scope in commit messages |
| `--clipboard` | Copy message to clipboard instead of committing |
| `--exclude <GLOB>` | Exclude files matching glob pattern (repeatable) |
| `--allow-secrets` | Allow committing with detected secrets |
| `--verbose` | Show symbol extraction details |
| `--show-prompt` | Debug: display the full LLM prompt |

### Commands

| Command | Description |
| --- | --- |
| `init` | Create a config file |
| `config` | Show current configuration |
| `doctor` | Check configuration and connectivity |
| `completions <shell>` | Generate shell completions |
| `hook install` | Install prepare-commit-msg hook |
| `hook uninstall` | Remove prepare-commit-msg hook |
| `hook status` | Check if hook is installed |

## πŸ”’ Security

CommitBee scans all content before it's sent to any LLM provider with **24 built-in patterns** across 13 categories:

- ☁️ **Cloud providers** β€” AWS access/secret keys, GCP service accounts & API keys, Azure storage keys
- πŸ€– **AI/ML** β€” OpenAI, Anthropic, HuggingFace tokens
- πŸ”§ **Source control** β€” GitHub (PAT, fine-grained, OAuth), GitLab tokens
- πŸ’¬ **Communication** β€” Slack tokens & webhooks, Discord webhooks
- πŸ’³ **Payment & SaaS** β€” Stripe, Twilio, SendGrid, Mailgun keys
- πŸ—„οΈ **Database** β€” MongoDB, PostgreSQL, MySQL, Redis, AMQP connection strings
- πŸ” **Cryptographic** β€” PEM private keys, JWT tokens
- πŸ”‘ **Generic** β€” API key assignments, quoted/unquoted secrets
- ⚠️ **Merge conflict detection** β€” Prevents committing unresolved conflicts

Add custom patterns or disable built-ins in your config:

```toml
custom_secret_patterns = ["CUSTOM_KEY_[a-zA-Z0-9]{32}"]
disabled_secret_patterns = ["Generic Secret (unquoted)"]
```

The default provider (Ollama) runs entirely on your machine. No data leaves your network unless you explicitly configure a cloud provider.

## πŸ§ͺ Testing

```bash
cargo test   # 367 tests β€” unit, snapshot (insta), property (proptest), integration (wiremock)
```

See [Testing Strategy](DOCS.md#testing-strategy) for the full breakdown.

## πŸ—ΊοΈ Changelog

See [`CHANGELOG.md`](CHANGELOG.md) for the full version history.

**Current:** `v0.5.0` *Beyond the Diff* β€” Full signature extraction, semantic change classification, cross-file connections, security hardening, and 36-fixture eval harness.

## 🀝 Contributing

Contributions are welcome! By contributing, you agree to the [Contributor License Agreement](CLA.md) β€” you'll be asked to sign it when you open your first pull request.

## πŸ’› Sponsor

If you find CommitBee useful, consider [**sponsoring my work**](https://github.com/sponsors/Sephyi) β€” it helps keep the project going.

## πŸ“„ License

This project is dual-licensed under [AGPL-3.0-only](LICENSES/AGPL-3.0-only.txt) and a commercial license. See [LICENSE](LICENSE) for details.

REUSE compliant β€” every file carries SPDX headers.

Copyright 2026 [Sephyi](https://sephy.io)