git-indexer
A Rust library for extracting git repository information and indexing it into a Helix DB graph database.
Installation
Add this to your Cargo.toml:
[]
= "0.1.0"
Usage
Standalone Extraction
Extract git information without pushing to Helix DB:
use extract;
use Path;
Helix DB Integration
Index a repository into a Helix DB instance:
use GitIndexerClient;
async
What It Extracts
Branches (BranchInfo)
- Branch name (local and remote)
- Current HEAD indicator
- Commit SHA the branch points to
Commits (CommitInfo)
- Commit SHA
- Message (title)
- Author (name and email)
- Timestamp
- Parent commit SHAs
- File changes
File Changes (FileChange)
- File path
- Change type (Added, Deleted, Modified, Renamed, Copied)
- Diff hunks with line-by-line changes
Helix DB Graph Structure
When indexed, the data is stored as:
Nodes:
Branch- Branch metadataCommit- Commit metadataFileChange- File modification with diff
Edges:
Commit -> Commit(parent relationship)Branch -> Commit(branch tip)Commit -> FileChange(files changed)
API Reference
Client Methods
| Method | Description |
|---|---|
index_repository(path) |
Extract and index entire repository |
create_commit_node(commit) |
Create a commit node |
create_branch_node(branch) |
Create a branch node |
create_file_node(file, commit_id) |
Create a file change node |
create_parent_edge(child, parent) |
Create parent relationship |
Technical Details
- Pure Rust git implementation via
gix - Efficient diff generation with
imara-diff(Histogram algorithm) - Binary file detection (skips diff for binary files)
- File size limit: 10MB per file
- Async-ready for Helix DB API calls
License
MIT License - see LICENSE for details.