gitsem
A semantic search layer for Git repositories that augments commits with vector embeddings, enabling AI agents and developers to search code by meaning rather than text patterns.
Features
- Semantic Commit Notes: Automatically attach embeddings and context to commits
- Vector Search: Search code using natural language queries
- Git-Native: Uses Git notes (
refs/notes/semantic) for storage - Team Collaboration: Share semantic indexes via git push/pull
- Retroactive Indexing: Add semantic notes to existing commit history
- Idempotent Operations: Safe to run after regular git commands
Installation
Prerequisites
- Rust 1.65 or higher
- Git 2.0 or higher
From crates.io
Build from Source
The binary will be installed to ~/.cargo/bin/gitsem.
Verify Installation
# OR use as git subcommand
Both gitsem and git-semantic binaries are installed, so you can use either:
gitsem <command>- Standalone commandgit semantic <command>- Git subcommand style
If the command isn't found, ensure ~/.cargo/bin is in your PATH:
Add this to your ~/.bashrc or ~/.zshrc to make it permanent.
How It Works
Architecture
┌─────────────────┐
│ Git Commits │
└────────┬────────┘
│ gitsem commit/reindex
▼
┌─────────────────────────────────┐
│ Git Notes (refs/notes/semantic)│ ← Source of Truth
│ - Commit metadata │
│ - Diffs │
│ - Vector embeddings (768-dim) │
└────────┬────────────────────────┘
│ gitsem pull
▼
┌─────────────────┐
│ SQLite (.git/ │ ← Search Index
│ semantic.db) │
│ - vec0 virtual │
│ table │
└─────────────────┘
│
▼ gitsem grep
┌─────────────────┐
│ Vector Search │
│ Results │
└─────────────────┘
Data Flow
- Create Semantic Notes:
gitsem commitorreindexgenerates embeddings and stores them as Git notes - Sync Across Team:
git push origin refs/notes/semanticshares notes with teammates - Build Search Index:
gitsem pullfetches notes and populates local SQLite database - Search:
gitsem grepperforms KNN vector similarity search
Commands
gitsem commit
Create a commit with semantic notes attached.
# Commit with all changes
# Commit staged changes
# Interactive (prompts for message)
What it does:
- Creates a regular Git commit
- Generates embeddings from the diff
- Attaches semantic note to the commit in
refs/notes/semantic
gitsem reindex <range>
Add semantic notes to existing commits retroactively.
# Index last 3 commits
# Index all commits since main
# Index specific range
What it does:
- Fetches all commits in the range
- Generates embeddings for each commit's diff
- Attaches semantic notes to existing commits
gitsem pull [remote]
Pull code changes and sync semantic notes.
# Pull from origin (default)
# Pull from upstream
What it does:
- Executes
git pull - Fetches
refs/notes/semanticfrom remote - Rebuilds local SQLite database from notes
gitsem grep <query>
Search code semantically using natural language.
# Basic search
# Limit results
What it does:
- Generates embedding for the query
- Performs KNN vector similarity search
- Returns semantically similar code chunks
gitsem show [commit]
View semantic note attached to a commit.
# Show note for HEAD
# Show note for specific commit
# Show note for HEAD~2
What it does:
- Displays formatted semantic note
- Shows embedding dimensions
- Previews commit content and diff
Examples
Example 1: New Repository Setup
# Clone a repository
# Index the last 10 commits (use either style)
# OR: git semantic reindex HEAD~10..HEAD
# Share semantic notes with team
Example 2: Daily Development Workflow
# Make changes
# Create commit with semantic notes
# OR: git semantic commit -a -m "feat: add JWT token validation"
# Pull teammate's changes and sync semantics
# OR: git semantic pull
# Search for related code
# OR: git semantic grep "token validation logic"
Example 3: Code Review
# View semantic context of a commit
# Search for similar patterns
Example 4: Team Collaboration
# Developer A: Create semantic commits
# Developer B: Pull and sync
Configuration
Environment Variables
OPENAI_API_KEY (Required for real embeddings)
Currently, the embedding generator is a placeholder. To use real embeddings:
-
Set your OpenAI API key:
-
Add to your shell config (
~/.bashrcor~/.zshrc):
Git Configuration
Semantic notes are stored in refs/notes/semantic. To automatically fetch notes:
Current Limitations
-
Placeholder Embeddings: The current implementation uses dummy embeddings (768-dimensional vectors with sequential values). Real LLM API integration (OpenAI, Cohere, etc.) needs to be implemented in
src/embed.rs. -
SQLite-vec Integration: The
vec0virtual table is defined but requires the sqlite-vec extension to be loaded at runtime for production vector search. -
No Automatic Sync: Semantic notes must be manually pushed/pulled via
git push origin refs/notes/semantic.
Development
Project Structure
gitsem/
├── src/
│ ├── main.rs # CLI and command handlers
│ ├── models.rs # CodeChunk data structure
│ ├── db.rs # SQLite database with vec0 table
│ ├── git.rs # Git notes read/write operations
│ └── embed.rs # Embedding generation (placeholder)
├── Cargo.toml
└── README.md
Building
Testing
Installing Locally
Roadmap
- Real embedding API integration (OpenAI, Cohere, local models)
- Load sqlite-vec extension for production vector search
- Automatic note syncing on push/pull
- Support for multiple embedding models
- Web UI for browsing semantic history
- VS Code extension
- GitHub Action for CI/CD integration
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
License
MIT OR Apache-2.0
Acknowledgments
- Built with gix - Pure Rust Git implementation
- Inspired by the need for semantic code search in AI-assisted development