Loregrep
Fast Repository Indexing Library for Coding Assistants
Loregrep is a Rust library with Python bindings that parses codebases into fast, searchable in-memory indexes. It's designed to provide coding assistants and AI tools with structured access to code functions, structures, dependencies, and call graphs.
Table of Contents
- Installation
- Quick Start
- What It Does
- Use Cases
- Language Support
- API Examples
- Available Tools
- Performance
- Architecture
- Examples
- Development Setup
- Contributing
- License
Installation
Rust Users
Python Users
System Requirements
- Rust: 1.70 or later
- Python: 3.8 or later
- Memory: 100MB+ available RAM (scales with repository size)
- OS: Linux, macOS, Windows
Quick Start
30-Second Example
First, clone the loregrep repository to try the examples:
Rust:
use LoreGrep;
use json;
async
Python:
# Create and scan the current repository
=
= await
=
# Search for functions containing "main"
= await
Expected Output (when run in loregrep repository):
π Scanned loregrep repository
42 files analyzed, 156 functions found
π Found functions:
main (src/main.rs:15) - Entry point for CLI application
cli_main (src/cli_main.rs:8) - CLI entry point wrapper
async_main (src/internal/ai/conversation.rs:45) - Main conversation loop
...
Try Your Own Repository
Once you understand the basics, scan your own projects:
# Scan any repository
= await
What It Does
- Parses code files using tree-sitter for accurate syntax analysis
- Indexes functions, structs, imports, exports, and relationships in memory
- Provides 6 standardized tools that coding assistants can call to query the codebase
- Enables AI systems to understand code structure without re-parsing
What It's NOT
- β Not an AI tool itself (provides data TO AI systems)
- β Not a traditional code analysis tool (no linting, metrics, complexity analysis)
- β Not a replacement for LSP servers (different use case)
Use Cases
Build Your Own Code Assistant
Create AI-powered tools that understand your codebase:
# Your custom coding assistant
=
= await
# Assistant can now search functions, analyze files, and understand relationships
Enhance Existing AI Tools
Add structured code understanding to chatbots and AI applications:
- Before: AI sees code as plain text
- After: AI understands functions, classes, imports, and relationships
Smart Code Search
Build advanced code search beyond text matching:
# Find all functions that call a specific function
= await
Repository Analysis
Understand codebase structure and complexity:
# Get complete repository overview
= await
Language Support
Language | Status | Functions | Structs/Classes | Imports | Calls |
---|---|---|---|---|---|
Rust | β Full | β | β | β | β |
Python | π Planned | π | π | π | π |
TypeScript | π Planned | π | π | π | π |
JavaScript | π Planned | π | π | π | π |
Go | π Future | π | π | π | π |
Want to contribute a language analyzer? See Contributing
API Examples
Rust API
use ;
use json;
// Configure with builder pattern
let mut loregrep = builder
.max_file_size // 1MB max file size
.max_depth // Maximum 10 directory levels
.file_patterns // Include only these files
.exclude_patterns // Skip these dirs
.respect_gitignore // Honor .gitignore
.build?;
// Scan repository (use "." for current directory)
let scan_result = loregrep.scan.await?;
println!;
// Get available tools for LLM integration
let tools: = get_tool_definitions;
// Execute tools
let functions = loregrep.execute_tool.await?;
let file_analysis = loregrep.execute_tool.await?;
Python API
# Configure with builder pattern
=
# Scan repository (use "." for current directory)
= await
# Get available tools
=
# Execute tools
= await
= await
Available Tools
Loregrep provides 6 standardized tools designed for LLM integration:
1. search_functions
Find functions by name or pattern across the codebase.
Input:
Output:
Use Case: Find entry points, locate specific functionality, discover API patterns.
2. search_structs
Find structures, classes, and types by name or pattern.
Input:
Output:
Use Case: Understand data structures, find models, discover type definitions.
3. analyze_file
Get comprehensive analysis of a specific file.
Input:
Output:
Use Case: Deep dive into specific files, understand file structure, code review.
4. get_dependencies
Find imports, exports, and dependencies for a file.
Input:
Output:
Use Case: Understand module relationships, track dependencies, refactoring impact.
5. find_callers
Find where specific functions are called.
Input:
Output:
Use Case: Impact analysis, understand function usage, refactoring safety.
6. get_repository_tree
Get repository structure and overview.
Input:
Output:
Use Case: Repository overview, architecture understanding, documentation generation.
Performance
Benchmarks vs Alternatives
Repository Size | Loregrep | ripgrep | ast-grep | Advantage |
---|---|---|---|---|
Small (100 files) | 0.8s | 0.2s | 1.2s | 4x faster than ast-grep |
Medium (1,000 files) | 3.2s | 1.1s | 8.5s | 2.6x faster than ast-grep |
Large (10,000 files) | 28s | 8s | 95s | 3.4x faster than ast-grep |
Benchmarks run on MacBook Pro M1, 16GB RAM. Times include parsing + indexing.
Note: ripgrep is faster for simple text search, but loregrep provides structured analysis that ripgrep cannot do.
Memory Usage
Repository Size | Peak Memory | Steady State | Index Size |
---|---|---|---|
Small (100 files) | 45MB | 12MB | 0.8MB |
Medium (1,000 files) | 180MB | 65MB | 8MB |
Large (10,000 files) | 850MB | 320MB | 85MB |
Performance Tips
// For large repositories
let loregrep = builder
.max_file_size // Skip very large files
.exclude_patterns
.max_depth // Limit directory depth
.build?;
Architecture
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Code Files βββββΆβ Tree-sitter βββββΆβ In-Memory β
β (.rs, .py, β β Parsing β β RepoMap β
β .ts, etc.) β β β β Indexes β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Coding Assistantββββββ 6 Query Tools ββββββ Fast Lookups β
β (Claude, GPT, β β (search, analyze,β β (functions, β
β Cursor, etc.) β β dependencies) β β structs, etc.)β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
Core Components
LoreGrep
: Main API facade with builder patternRepoMap
: Fast in-memory indexes with lookup optimizationRepositoryScanner
: File discovery with gitignore support- Language Analyzers: Tree-sitter based parsing (currently Rust only)
- Tool System: 6 standardized tools for AI integration
Module Structure
src/
βββ lib.rs # Public API exports and documentation
βββ loregrep.rs # Main LoreGrep struct and builder
βββ core/ # Core types and errors
βββ types/ # Data structures (FunctionSignature, etc.)
βββ analyzers/ # Language-specific parsers
β βββ rust.rs # Rust analyzer (tree-sitter based)
βββ storage/ # In-memory indexing
β βββ repo_map.rs # RepoMap with fast lookups
βββ scanner/ # File discovery
βββ internal/ # Internal CLI and AI implementation
β βββ cli.rs # CLI application
β βββ ai/ # Anthropic client and conversation
β βββ ui/ # Progress indicators and theming
βββ cli_main.rs # CLI binary entry point
Technical Notes
Memory Management:
- Indexes built in memory for fast access
- Thread-safe with
Arc<Mutex<>>
design - Memory usage scales linearly with codebase size
- No external dependencies required at runtime
Error Handling:
Uses comprehensive error types from core::errors::LoreGrepError
:
- IO errors (file access, permissions)
- Parse errors (malformed code)
- Configuration errors
- API errors (for AI integration)
Examples
The project includes comprehensive examples for different use cases:
Rust Examples (examples/
)
basic_scan.rs
- Simple repository scanning and basic queriestool_execution.rs
- Complete LLM tool integration patternsfile_watcher.rs
- File watching for automatic re-indexingcoding_assistant.rs
- Full coding assistant implementationadvanced_queries.rs
- Complex search and analysis patterns
Python Examples (python/examples/
)
basic_usage.py
- Python API introduction and common patternsai_integration.py
- Integration with OpenAI/Anthropic APIsweb_server.py
- REST API wrapper for web applicationsbatch_analysis.py
- Processing multiple repositories
Running Examples
Rust:
Python:
Development Setup
Prerequisites
- Rust 1.70 or later
- Python 3.8+ (for Python bindings)
- For AI integration tests: Anthropic API key
Building from Source
# Build Rust library
# Build Python package (requires maturin)
Testing
# Run all tests
# Run specific test suites
# Run with output
# Test Python bindings
&&
Development CLI Usage
The CLI is primarily for development and testing:
# Build development binary
# Basic commands for testing
Known Test Status
- β 60+ tests passing across core functionality
- β οΈ 8 pre-existing test failures in older modules (technical debt)
- β 100% pass rate on new Phase 3B+ tests
Contributing
We welcome contributions! Loregrep prioritizes the library API over CLI functionality - focus on core indexing and analysis capabilities that enable AI coding assistants.
High Priority Areas
1. Language Support (Most Needed)
Help expand beyond Rust to support more programming languages:
-
Python analyzer (
src/analyzers/python.rs
)- Functions, classes, decorators
- Import/export analysis
- Method calls and inheritance
-
TypeScript analyzer (
src/analyzers/typescript.rs
)- Interfaces, types, generics
- ES6+ features, async/await
- Module system analysis
-
JavaScript analyzer (
src/analyzers/javascript.rs
)- Functions, classes, arrow functions
- CommonJS and ES6 modules
- Dynamic features handling
2. Performance Improvements
- Memory optimization for large repositories (>100k files)
- Incremental update detection when files change
- Query result caching improvements
- Parallel parsing optimizations
3. Advanced Analysis Features
- Call graph visualization and analysis
- Dependency impact analysis
- Cross-language project support
- Code complexity metrics
Development Workflow
-
Fork and Clone
-
Create Feature Branch
-
Develop and Test
-
Test Integration
# Test CLI # Test public API # Test Python bindings (if applicable) &&
-
Submit Pull Request
- Include tests for new functionality
- Update documentation for public API changes
- Follow existing code style and patterns
Code Style Guidelines
- Use
rustfmt
for formatting:cargo fmt
- Use
clippy
for linting:cargo clippy
- Add comprehensive tests for new functionality
- Update documentation for any public API changes
- Follow existing module organization patterns
- Include examples for new features
Getting Help
- GitHub Issues: Bug reports and feature requests
- GitHub Discussions: Design discussions and questions
- Discord: Real-time chat (link in issues)
Roadmap
v0.4.0 - Multi-Language Support (Q1 2024)
- Python Analyzer: Full Python support with classes, functions, imports
- TypeScript Analyzer: Complete TS/JS support including modern features
- Performance: 2x improvement in scanning speed for large repositories
v0.5.0 - Advanced Analysis (Q2 2024)
- Call Graph Analysis: Function call extraction and visualization
- Dependency Tracking: Advanced import/export analysis
- Incremental Updates: Smart re-indexing when files change
v1.0.0 - Production Ready (Q3 2024)
- API Stability: Stable public API with semantic versioning
- Memory Optimization: Improved handling of very large repositories
- MCP Server Integration: Standard Model Context Protocol interface
- Editor Integrations: VS Code and IntelliJ plugins
Future
- Go Language Support: Package analysis and interface tracking
- Database Persistence: Optional disk-based storage for massive codebases
- Distributed Analysis: Support for monorepos and multi-service codebases
Want to help with any roadmap item? Check out our Contributing Guide!
License
This project is dual-licensed under either:
Choose the license that best fits your use case. Most users prefer MIT for simplicity.
Ready to get started? Jump to Installation or try the Quick Start example!