commitbee 0.3.0

AI-powered commit message generator using tree-sitter semantic analysis and local LLMs
Documentation

๐Ÿ CommitBee

CI License: PolyForm Noncommercial MSRV: 1.94 REUSE

The commit message generator that actually understands your code.

CommitBee is a Rust-native CLI tool that uses tree-sitter semantic analysis and LLMs to generate high-quality conventional commit messages. Unlike every other tool in this space, CommitBee doesn't just pipe raw git diff output to an LLM โ€” it parses both the staged and HEAD versions of your files, maps diff hunks to symbol spans (functions, classes, methods), and provides structured semantic context. This produces fundamentally better commit messages, especially for complex multi-file changes.

โœจ What Makes CommitBee Different

Feature CommitBee Others
๐ŸŒณ Tree-sitter semantic analysis Yes No
๐Ÿ”€ Automatic commit splitting Yes No
๐Ÿง  Evidence-based type inference Yes No
๐Ÿ”’ Built-in secret scanning Yes Rarely
๐Ÿ“Š Token budget management Yes No
โšก Streaming LLM output Yes Rarely
๐Ÿ” Prompt debug mode Yes No
๐Ÿ  Local-first (Ollama default) Yes Cloud-first
๐Ÿฆ€ Single static binary Yes Node.js/Python

Every competitor sends raw diffs to LLMs. CommitBee sends semantic context โ€” which functions changed, what was added or removed, and why the change matters structurally.

Commit splitting

When your staged changes contain logically independent work (e.g., a bugfix in one module + a refactor in another), CommitBee detects this and offers to split them into separate, well-typed commits automatically. The splitter uses diff-shape fingerprinting with Jaccard similarity clustering โ€” files are grouped not just by directory but by the actual shape and vocabulary of their changes.

โšก Commit split suggested โ€” 2 logical change groups detected:

  Group 1: feat(llm)  [2 files]
    [M] src/services/llm/anthropic.rs (+20 -5)
    [M] src/services/llm/openai.rs (+8 -3)

  Group 2: fix(sanitizer)  [1 file]
    [M] src/services/sanitizer.rs (+3 -1)

? Split into separate commits? (Y/n)

๐Ÿ“ฆ Installation

From source

cargo install commitbee

Build from repository

git clone https://github.com/sephyi/commitbee.git
cd commitbee
cargo build --release

The binary will be at ./target/release/commitbee.

Requirements

  • Rust 1.94+ (edition 2024)
  • Ollama running locally (default provider) โ€” Install Ollama
  • A model pulled in Ollama (recommended: qwen3:4b)
ollama pull qwen3:4b

๐Ÿš€ Quick Start

# Stage your changes
git add src/feature.rs

# Generate and commit interactively
commitbee

# Preview without committing
commitbee --dry-run

# Auto-confirm and commit
commitbee --yes

# See what the LLM sees
commitbee --show-prompt

That's it. CommitBee works with zero configuration if Ollama is running locally.

๐Ÿ”ง Configuration

CommitBee stores configuration in a platform-specific directory. Create a config with:

commitbee init

Example config

provider = "ollama"
model = "qwen3:4b"
ollama_host = "http://localhost:11434"
max_diff_lines = 500
max_file_lines = 100
max_context_chars = 24000

[format]
include_body = true
include_scope = true
lowercase_subject = true

Environment variables

Variable Description Default
COMMITBEE_PROVIDER LLM provider ollama
COMMITBEE_MODEL Model name qwen3:4b
COMMITBEE_OLLAMA_HOST Ollama server URL http://localhost:11434
COMMITBEE_API_KEY API key (cloud providers) โ€”

๐Ÿ“– Usage

commitbee [OPTIONS] [COMMAND]

Options

Flag Description
--dry-run Print message only, don't commit
--yes Auto-confirm and commit
-n, --generate Generate N candidates (1-5, default 1)
--no-split Disable commit split suggestions
--no-scope Disable scope in commit messages
--allow-secrets Allow committing with detected secrets
--verbose Show symbol extraction details
--show-prompt Debug: display the full LLM prompt

Commands

Command Description
init Create a config file
config Show current configuration
doctor Check configuration and connectivity
completions <shell> Generate shell completions
hook install Install prepare-commit-msg hook
hook uninstall Remove prepare-commit-msg hook
hook status Check if hook is installed

๐ŸŒณ How It Works

CommitBee's pipeline goes beyond simple diff forwarding:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Stage  โ”‚ โ†’  โ”‚   Git    โ”‚ โ†’  โ”‚ Tree-sitterโ”‚ โ†’  โ”‚  Split   โ”‚ โ†’  โ”‚  Context  โ”‚ โ†’  โ”‚   LLM   โ”‚
โ”‚ Changes โ”‚    โ”‚  Service โ”‚    โ”‚  Analyzer  โ”‚    โ”‚ Detector โ”‚    โ”‚  Builder  โ”‚    โ”‚Provider โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                    โ”‚                โ”‚                 โ”‚                โ”‚               โ”‚
               Staged diff      Symbol spans     Group files      Budget-aware     Commit message
               + file list      (functions,      by module,       prompt with      (conventional
                                classes, etc.)   suggest split    semantic context    format)
  1. Git Service โ€” Discovers the repo via gix, reads staged changes and diffs (NUL-delimited for path safety)
  2. Tree-sitter Analyzer โ€” Parses both staged and HEAD file versions in parallel (via rayon), maps diff hunks to symbol spans (functions, structs, methods) with tri-state tracking (added/removed/modified-signature)
  3. Commit Splitter โ€” Groups files using diff-shape fingerprinting + Jaccard similarity clustering, detects multi-concern changes, offers to split into separate commits
  4. Context Builder โ€” Assembles a budget-aware prompt with evidence flags, constraint rules, primary change detection, and metadata-aware breaking change signals
  5. Safety Scanner โ€” Checks for secrets and merge conflicts (added-line-only, with self-detection prevention) before anything leaves your machine
  6. LLM Provider โ€” Streams the prompt to your chosen model and parses the response
  7. Commit Sanitizer โ€” Validates the output as proper conventional commit format, handles JSON extraction from noisy LLM output (thought blocks, conversational preambles, code fences), wraps body at 72 chars

Supported languages

Language Parser
Rust tree-sitter-rust
TypeScript tree-sitter-typescript
JavaScript tree-sitter-javascript
Python tree-sitter-python
Go tree-sitter-go

Files in unsupported languages are still included in the diff context โ€” they just don't get semantic symbol extraction.

๐Ÿ”’ Security

CommitBee scans all content before it's sent to any LLM provider:

  • ๐Ÿ”‘ API key detection โ€” AWS keys, OpenAI keys, generic secrets
  • ๐Ÿ” Private key detection โ€” PEM-encoded private keys
  • ๐Ÿ”— Connection string detection โ€” Database URLs with credentials
  • โš ๏ธ Merge conflict detection โ€” Prevents committing unresolved conflicts

The default provider (Ollama) runs entirely on your machine. No data leaves your network unless you explicitly configure a cloud provider.

๐Ÿ—๏ธ Architecture

src/
โ”œโ”€โ”€ main.rs              # Entry point
โ”œโ”€โ”€ lib.rs               # Library exports
โ”œโ”€โ”€ app.rs               # Application orchestrator
โ”œโ”€โ”€ cli.rs               # CLI arguments (clap)
โ”œโ”€โ”€ config.rs            # Configuration (figment layered)
โ”œโ”€โ”€ error.rs             # Error types (thiserror + miette)
โ”œโ”€โ”€ domain/
โ”‚   โ”œโ”€โ”€ change.rs        # FileChange, StagedChanges, ChangeStatus
โ”‚   โ”œโ”€โ”€ symbol.rs        # CodeSymbol, SymbolKind
โ”‚   โ”œโ”€โ”€ context.rs       # PromptContext (semantic prompt assembly)
โ”‚   โ””โ”€โ”€ commit.rs        # CommitType (single source of truth)
โ””โ”€โ”€ services/
    โ”œโ”€โ”€ git.rs           # GitService (gix + git CLI, concurrent content fetching)
    โ”œโ”€โ”€ analyzer.rs      # AnalyzerService (tree-sitter, parallel via rayon)
    โ”œโ”€โ”€ context.rs       # ContextBuilder (token budget, evidence flags)
    โ”œโ”€โ”€ safety.rs        # Secret scanning, conflict detection
    โ”œโ”€โ”€ sanitizer.rs     # CommitSanitizer (JSON + plain text, BREAKING CHANGE footer)
    โ”œโ”€โ”€ splitter.rs      # CommitSplitter (diff-shape + Jaccard clustering)
    โ””โ”€โ”€ llm/
        โ”œโ”€โ”€ mod.rs       # LlmProvider trait + enum dispatch + shared SYSTEM_PROMPT
        โ”œโ”€โ”€ ollama.rs    # OllamaProvider (streaming NDJSON)
        โ”œโ”€โ”€ openai.rs    # OpenAiProvider (SSE streaming)
        โ””โ”€โ”€ anthropic.rs # AnthropicProvider (SSE streaming)

๐Ÿงช Testing

cargo test                    # All tests (178 tests)
cargo test --test sanitizer   # CommitSanitizer tests
cargo test --test splitter    # CommitSplitter tests
cargo test --test safety      # Secret scanner tests
cargo test --test context     # ContextBuilder tests
cargo test --test commit_type # CommitType tests
cargo test --test integration # LLM provider integration tests

The test suite includes snapshot tests (insta), property-based tests (proptest), never-panic guarantees for all user-facing parsers, and integration tests using wiremock for LLM provider mocking.

๐Ÿ—บ๏ธ Roadmap

Phase Version Status
๐Ÿ”ง Stability & Correctness v0.2.0 โœ… Complete
โœจ Polish & Providers v0.2.0 โœ… Complete
๐Ÿš€ Differentiation v0.3.0 โœ… Complete
๐Ÿ‘‘ Market Leadership v0.4.0+ ๐Ÿ”ฎ Future

v0.3.0 highlights (current)

  • Diff-shape fingerprinting + Jaccard clustering โ€” Splitter groups files by change shape and content vocabulary, not just directory
  • Evidence-based type inference โ€” Constraint rules from code analysis drive commit type selection (bug evidence โ†’ fix, mechanical โ†’ style, dependency-only โ†’ chore)
  • Robust LLM output parsing โ€” Sanitizer handles <think>/<thought> blocks, conversational preambles, noisy JSON extraction
  • Metadata-aware breaking change detection โ€” Detects MSRV bumps, engines.node, requires-python changes
  • Symbol tri-state tracking โ€” Added/removed/modified-signature differentiation in tree-sitter analysis
  • Primary change detection โ€” Identifies the single most significant change for subject anchoring
  • Post-generation validation โ€” Subject specificity validator ensures concrete entity naming
  • NUL-delimited git parsing โ€” Safe handling of paths with special characters
  • Parallel tree-sitter parsing โ€” rayon for CPU-bound parsing, tokio JoinSet for concurrent git fetching
  • Anti-hallucination prompt engineering โ€” EVIDENCE/CONSTRAINTS sections, negative examples, anti-copy rules

v0.2.0 highlights

  • Cloud providers โ€” OpenAI-compatible and Anthropic streaming support
  • Commit splitting โ€” Automatic detection and splitting of multi-concern staged changes
  • Git hook integration โ€” commitbee hook install/uninstall/status
  • Shell completions โ€” bash, zsh, fish, powershell via clap_complete
  • Rich error diagnostics โ€” miette for actionable error messages
  • Multiple message generation โ€” --generate N with interactive candidate selection
  • Hierarchical config โ€” figment-based layering (CLI > Env > File > Defaults)
  • Structured logging โ€” tracing with COMMITBEE_LOG env filter
  • Doctor command โ€” commitbee doctor for connectivity and config checks
  • Secure key storage โ€” OS keychain via keyring (optional feature)
  • Body line wrapping โ€” Commit body text wrapped at 72 characters

See PRD.md for the full product requirements document.

๐Ÿค Contributing

Contributions are welcome! By contributing, you agree to the Contributor License Agreement โ€” you'll be asked to sign it when you open your first pull request.

The project uses:

  • Rust edition 2024 (MSRV 1.94)
  • Conventional commits for all commit messages
  • REUSE/SPDX for license compliance
# Development workflow
cargo fmt                     # Format code
cargo clippy -- -D warnings   # Lint (must pass clean)
cargo test                    # Run all tests

# Manual testing
git add some-file.rs
cargo run -- --dry-run        # Preview commit message
cargo run -- --show-prompt    # Debug the LLM prompt

๐Ÿ’› Sponsor

If you find CommitBee useful, consider sponsoring my work.

๐Ÿ“„ License

This project is licensed under PolyForm-Noncommercial-1.0.0.

REUSE compliant โ€” every file carries SPDX headers.

Copyright 2026 Sephyi