# Git Module - High-Performance Pure Rust Git Operations 🦀
This module provides **microsecond-level Git operations** using pure Rust via
[gitoxide (gix)](https://github.com/Byron/gitoxide). It handles all Git repository operations and
file discovery with exceptional performance for commercial-grade applications.
## 🚀 Performance Benefits
Our pure gix implementation delivers **massive performance improvements** over traditional
approaches:
| Staged Files Discovery | 7.31ms | 140μs | **42μs** | **175x faster** |
| Repository Discovery | 2.5ms | N/A | **350μs** | **7x faster** |
| All Files Listing | 3-5ms | ~500μs | **100-500μs** | **6-50x faster** |
| **Real-world impact in Guardy hooks:** | | | | |
- **Before**: 20.9ms total hook execution
- **After**: 16.5ms total hook execution (**21% faster**)
- Git operations: **~1ms total** (was ~6ms)
## 🏗️ Architecture & Caching
### Intelligent Multi-Level Caching
```rust
// Application-wide repository caching
static GIT_REPO: LazyLock<Option<Arc<GitRepo>>> = LazyLock::new(|| {
GitRepo::discover().ok().map(Arc::new)
});
// Per-hook execution caching
pub struct HookFileCache {
staged_files: OnceLock<Vec<PathBuf>>, // Cached on first access
all_files: OnceLock<Vec<PathBuf>>, // Cached on first access
push_files: OnceLock<Vec<PathBuf>>, // Cached on first access
}
```
### Thread-Safe Design
- Uses `gix::ThreadSafeRepository` for concurrent access
- `Arc<GitRepo>` for safe sharing across threads
- Optimized for parallel hook execution
## 📁 Module Structure
### Core Implementation
- **`mod.rs`** - Main `GitRepo` struct with gix integration
- **`operations.rs`** - High-performance file discovery operations
- **`remote.rs`** - Remote repository operations
### Future Extensions
- **`hooks.rs`** - Git hook installation and management (planned)
- **`commit.rs`** - Commit-related operations (planned)
## 🔧 Current API
### GitRepo (mod.rs)
```rust
impl GitRepo {
pub fn discover() -> Result<Self> // Find repo from current directory
pub fn open(path: &Path) -> Result<Self> // Open repo at specific path
pub fn current_branch(&self) -> Result<String> // Get current branch name
pub fn git_dir(&self) -> PathBuf // Get .git directory path
}
```
### High-Performance Operations (operations.rs)
```rust
impl GitRepo {
// 🚀 42μs - Primary use case for pre-commit hooks
pub fn get_staged_files(&self) -> Result<Vec<PathBuf>>
// 🚀 100-500μs - All tracked files in repository
pub fn get_all_files(&self) -> Result<Vec<PathBuf>>
// For pre-push hooks (TODO: implement proper diff)
pub fn get_push_files(&self, remote: &str, branch: &str) -> Result<Vec<PathBuf>>
// TODO: Implement with gix status API
pub fn get_modified_files(&self) -> Result<Vec<PathBuf>>
}
```
## ⚡ Implementation Details
### Staged Files Detection Algorithm
Our implementation directly compares Git index with HEAD tree for maximum performance:
```rust
pub fn get_staged_files(&self) -> Result<Vec<PathBuf>> {
let repo = self.gix_repo();
let index = repo.index()?;
// Handle initial commit case
let head_tree = match repo.head_tree_id() {
Ok(tree_id) => Some(repo.find_tree(tree_id)?),
Err(_) => None, // All files staged for initial commit
};
// Compare index entries with HEAD tree entries
for entry in index.entries() {
match tree.lookup_entry_by_path(&entry.path()) {
Ok(Some(tree_entry)) => {
if entry.id != tree_entry.object_id() {
// File content differs = staged change
staged_files.push(path);
}
},
Ok(None) | Err(_) => {
// File not in HEAD = newly added (staged)
staged_files.push(path);
}
}
}
}
```
### Why Not External Commands?
- **Process spawning overhead**: 3-7ms per `git` command
- **Parsing overhead**: String parsing and validation
- **I/O bottlenecks**: Multiple filesystem operations
- **Error handling complexity**: Process exit codes and stderr parsing
### Why gix Over libgit2?
- ✅ **4x faster** than libgit2 for staged files (42μs vs 140μs)
- ✅ **Pure Rust** - no C dependencies or FFI overhead
- ✅ **Memory safe** - no risk of C memory issues
- ✅ **Better error handling** - Rust's Result type
- ✅ **Thread safety** - Built-in thread-safe operations
- ✅ **Maintenance** - Active pure-Rust development
## 🧪 Performance Benchmarking
Run the included benchmark to compare all approaches:
```bash
# Run performance comparison
cargo run --example git_timing_test --release
# Expected output:
# Command: 7.307ms
# libgit2: 140.213μs (52x faster)
# gix: 41.77μs (175x faster) ← Winner!
```
For comprehensive benchmarks:
```bash
cargo bench --bench git_performance_comparison
```
## 🎯 Design Principles
1. **Microsecond Performance** - Sub-millisecond operations for all Git queries
2. **Pure Rust** - Zero C dependencies, maximum safety and performance
3. **Smart Caching** - Multi-level caching for optimal repeated access
4. **Thread Safety** - Designed for concurrent hook execution
5. **Correctness First** - Proper Git semantics, not shortcuts
6. **Commercial Grade** - Production-ready error handling and logging
## 🔄 Integration Patterns
### Recommended Usage
```rust
// Get cached repository (fast - uses LazyLock)
let repo = get_cached_git_repo()?;
// Create per-hook cache (enables OnceLock caching)
let cache = HookFileCache::new(repo, "pre-commit");
cache.precompute(); // Start background loading
// Get files (42μs first call, ~0μs subsequent calls)
let staged_files = cache.get_staged_files();
```
### Anti-Patterns to Avoid
```rust
// ❌ DON'T: Create new GitRepo instances repeatedly
let repo = GitRepo::discover()?; // 350μs each time
// ❌ DON'T: Call get_staged_files() multiple times without caching
for _ in 0..10 {
let files = repo.get_staged_files()?; // 42μs × 10 = 420μs
}
// ✅ DO: Use the caching layer
let cache = HookFileCache::new(repo, "pre-commit");
let files = cache.get_staged_files(); // 42μs once, then cached
```
## 🚧 Future Enhancements
### Planned Features
- **Modified files detection** using gix status API
- **Proper push files diff** between local and remote branches
- **Git hooks management** for programmatic hook installation
- **Commit operations** for automated commit workflows
- **Branch operations** for advanced Git workflows
### Performance Targets
- [ ] Sub-10μs staged file detection for small repositories
- [ ] Sub-100μs operations for repositories with 10k+ files
- [ ] Parallel file discovery for large monorepos
- [ ] Memory-mapped index reading for extreme performance
## 🔍 When to Extend This Module
### Add to `mod.rs` when:
- Adding fundamental repository operations
- Extending GitRepo with new core capabilities
- Adding repository discovery/initialization logic
### Add to `operations.rs` when:
- Adding new file discovery operations
- Implementing Git status/diff functionality
- Creating file listing optimizations
### Add to `remote.rs` when:
- Adding remote repository operations
- Implementing push/pull functionality
- Managing remote branch comparisons
---
**This module powers Guardy's commercial-grade Git operations with microsecond-level performance.
The pure Rust implementation ensures memory safety, thread safety, and exceptional speed for
production workloads.** 🚀