# git-indexer
A Rust library for extracting git repository information and indexing it into a Helix DB graph database.
## Installation
Add this to your `Cargo.toml`:
```toml
[dependencies]
git-indexer = "0.1.0"
```
## Usage
### Standalone Extraction
Extract git information without pushing to Helix DB:
```rust
use git_indexer::extraction::extract;
use std::path::Path;
fn main() -> git_indexer::Result<()> {
let git_info = extract(Path::new("/path/to/repo"))?;
println!("Branches: {}", git_info.branches.len());
println!("Commits: {}", git_info.commits.len());
for commit in &git_info.commits {
println!("{}: {}", &commit.id[..8], commit.message);
}
Ok(())
}
```
### Helix DB Integration
Index a repository into a Helix DB instance:
```rust
use git_indexer::GitIndexerClient;
#[tokio::main]
async fn main() -> git_indexer::Result<()> {
let client = GitIndexerClient::builder()
.endpoint("http://localhost:6969")
.build()?;
// Extract and push to Helix DB
let git_info = client.index_repository("/path/to/repo").await?;
println!("Indexed {} commits", git_info.commits.len());
Ok(())
}
```
## What It Extracts
### Branches (`BranchInfo`)
- Branch name (local and remote)
- Current HEAD indicator
- Commit SHA the branch points to
### Commits (`CommitInfo`)
- Commit SHA
- Message (title)
- Author (name and email)
- Timestamp
- Parent commit SHAs
- File changes
### File Changes (`FileChange`)
- File path
- Change type (Added, Deleted, Modified, Renamed, Copied)
- Diff hunks with line-by-line changes
## Helix DB Graph Structure
When indexed, the data is stored as:
**Nodes:**
- `Branch` - Branch metadata
- `Commit` - Commit metadata
- `FileChange` - File modification with diff
**Edges:**
- `Commit -> Commit` (parent relationship)
- `Branch -> Commit` (branch tip)
- `Commit -> FileChange` (files changed)
## API Reference
### Client Methods
| `index_repository(path)` | Extract and index entire repository |
| `create_commit_node(commit)` | Create a commit node |
| `create_branch_node(branch)` | Create a branch node |
| `create_file_node(file, commit_id)` | Create a file change node |
| `create_parent_edge(child, parent)` | Create parent relationship |
## Technical Details
- Pure Rust git implementation via `gix`
- Efficient diff generation with `imara-diff` (Histogram algorithm)
- Binary file detection (skips diff for binary files)
- File size limit: 10MB per file
- Async-ready for Helix DB API calls
## License
MIT License - see [LICENSE](LICENSE) for details.