git-indexer 0.1.0

Git repository indexer for Helix DB graph database
Documentation

git-indexer

A Rust library for extracting git repository information and indexing it into a Helix DB graph database.

Installation

Add this to your Cargo.toml:

[dependencies]
git-indexer = "0.1.0"

Usage

Standalone Extraction

Extract git information without pushing to Helix DB:

use git_indexer::extraction::extract;
use std::path::Path;

fn main() -> git_indexer::Result<()> {
    let git_info = extract(Path::new("/path/to/repo"))?;

    println!("Branches: {}", git_info.branches.len());
    println!("Commits: {}", git_info.commits.len());

    for commit in &git_info.commits {
        println!("{}: {}", &commit.id[..8], commit.message);
    }

    Ok(())
}

Helix DB Integration

Index a repository into a Helix DB instance:

use git_indexer::GitIndexerClient;

#[tokio::main]
async fn main() -> git_indexer::Result<()> {
    let client = GitIndexerClient::builder()
        .endpoint("http://localhost:6969")
        .build()?;

    // Extract and push to Helix DB
    let git_info = client.index_repository("/path/to/repo").await?;
    
    println!("Indexed {} commits", git_info.commits.len());
    Ok(())
}

What It Extracts

Branches (BranchInfo)

  • Branch name (local and remote)
  • Current HEAD indicator
  • Commit SHA the branch points to

Commits (CommitInfo)

  • Commit SHA
  • Message (title)
  • Author (name and email)
  • Timestamp
  • Parent commit SHAs
  • File changes

File Changes (FileChange)

  • File path
  • Change type (Added, Deleted, Modified, Renamed, Copied)
  • Diff hunks with line-by-line changes

Helix DB Graph Structure

When indexed, the data is stored as:

Nodes:

  • Branch - Branch metadata
  • Commit - Commit metadata
  • FileChange - File modification with diff

Edges:

  • Commit -> Commit (parent relationship)
  • Branch -> Commit (branch tip)
  • Commit -> FileChange (files changed)

API Reference

Client Methods

Method Description
index_repository(path) Extract and index entire repository
create_commit_node(commit) Create a commit node
create_branch_node(branch) Create a branch node
create_file_node(file, commit_id) Create a file change node
create_parent_edge(child, parent) Create parent relationship

Technical Details

  • Pure Rust git implementation via gix
  • Efficient diff generation with imara-diff (Histogram algorithm)
  • Binary file detection (skips diff for binary files)
  • File size limit: 10MB per file
  • Async-ready for Helix DB API calls

License

MIT License - see LICENSE for details.