Loregrep
Fast Repository Indexing Library for Coding Assistants
Loregrep is a Rust library that parses codebases into fast, searchable in-memory indexes. It's designed to provide coding assistants and AI tools with structured access to code functions, structures, dependencies, and call graphs.
What It Does
- Parses code files using tree-sitter for accurate syntax analysis
- Indexes functions, structs, imports, exports, and relationships in memory
- Provides 6 standardized tools that coding assistants can call to query the codebase
- Enables AI systems to understand code structure without re-parsing
What It's NOT
- β Not an AI tool itself (provides data TO AI systems)
- β Not a traditional code analysis tool (no linting, metrics, complexity analysis)
Current Status
Language Support:
- β Rust - Full support (functions, structs, imports, calls)
- π Python, TypeScript, JavaScript, Go - Roadmap (Coming soon...)
Core Features:
- β Repository scanning with gitignore support
- β In-memory indexing with fast lookups
- β 6 tool interface for LLM integration
- β Thread-safe API with builder pattern
Performance (Typical):
- Small repos (100 files): <1s analysis, <1MB memory
- Medium repos (1,000 files): <10s analysis, <10MB memory
- Large repos (10,000 files): <60s analysis, <100MB memory
Development Setup
Prerequisites
- Rust 1.70 or later
- For AI integration tests: Anthropic API key
Building
Testing
# Run all tests
# Run specific test suites
# Run with output
Development CLI Usage
The CLI is primarily for development and testing:
# Build development binary
# Basic commands for testing
Architecture Overview
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Code Files βββββΆβ Tree-sitter βββββΆβ In-Memory β
β (.rs, .py, β β Parsing β β RepoMap β
β .ts, etc.) β β β β Indexes β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Coding Assistantββββββ 6 Query Tools ββββββ Fast Lookups β
β (Claude, GPT, β β (search, analyze,β β (functions, β
β Cursor, etc.) β β dependencies) β β structs, etc.)β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
Core Components
LoreGrep
: Main API facade with builder patternRepoMap
: Fast in-memory indexes with lookup optimizationRepositoryScanner
: File discovery with gitignore support- Language Analyzers: Tree-sitter based parsing (currently Rust only)
- Tool System: 6 standardized tools for AI integration
Module Structure
src/
βββ lib.rs # Public API exports and documentation
βββ loregrep.rs # Main LoreGrep struct and builder
βββ core/ # Core types and errors
βββ types/ # Data structures (FunctionSignature, etc.)
βββ analyzers/ # Language-specific parsers
β βββ rust.rs # Rust analyzer (tree-sitter based)
βββ storage/ # In-memory indexing
β βββ repo_map.rs # RepoMap with fast lookups
βββ scanner/ # File discovery
βββ internal/ # Internal CLI and AI implementation
β βββ cli.rs # CLI application
β βββ ai/ # Anthropic client and conversation
β βββ ui/ # Progress indicators and theming
βββ cli_main.rs # CLI binary entry point
Library API for Integrators
The public API is designed for external integration:
use ;
use json;
// Initialize and scan
let mut loregrep = builder
.max_file_size
.build?;
let scan_result = loregrep.scan.await?;
// Get tool definitions for LLM
let tools: = get_tool_definitions;
// Execute tool calls (from LLM)
let result = loregrep.execute_tool.await?;
Available Tools for LLM Integration
- search_functions - Find functions by name/pattern
- search_structs - Find structures by name/pattern
- analyze_file - Get detailed file analysis
- get_dependencies - Find imports/exports for a file
- find_callers - Get function call sites
- get_repository_tree - Get repository structure and overview
Examples and Integration
The examples/
directory contains integration patterns:
basic_scan.rs
- Simple repository scanningtool_execution.rs
- LLM tool integration patternsfile_watcher.rs
- File watching for automatic re-indexingcoding_assistant.rs
- Complete coding assistant implementationbasic_usage.rs
- Public API usage patterns
Testing Strategy
Test Structure
Running Tests
# All tests
# Specific test categories
# With environment setup for AI tests
ANTHROPIC_API_KEY=your-key
Known Test Status
- β 60+ tests passing across core functionality
- β οΈ 8 pre-existing test failures in older modules (technical debt)
- β 100% pass rate on new Phase 3B+ tests
Contributing
Areas Needing Help
-
Language Support (High Priority)
- Python analyzer in
src/analyzers/python.rs
- TypeScript analyzer in
src/analyzers/typescript.rs
- JavaScript and Go analyzers
- Python analyzer in
-
Performance (Medium Priority)
- Memory optimization for large repositories
- Incremental update detection
- Query result caching improvements
-
Advanced Features (Future)
- Call graph visualization
- Dependency impact analysis
- MCP (Model Context Protocol) server interface
Development Workflow
-
Fork and Clone
-
Create Feature Branch
-
Develop and Test
-
Test Integration
# Test CLI # Test public API
Code Style Guidelines
- Use
rustfmt
for formatting:cargo fmt
- Use
clippy
for linting:cargo clippy
- Add tests for new functionality
- Update documentation for public API changes
- Follow existing module organization patterns
Implementation Status
β Completed (Production Ready):
- Foundation & Core Architecture
- Enhanced In-Memory Storage
- CLI Foundation
- AI Integration
- Enhanced CLI Experience
- Public API Implementation
π In Progress:
- Advanced Analysis Features (call graphs, dependency tracking)
π Planned:
- MCP Server Architecture
- Multi-Language Support (Python, TypeScript, JavaScript, Go)
- Advanced Features (incremental updates, performance optimization)
Technical Notes
Memory Management
- Indexes built in memory for fast access
- Thread-safe with
Arc<Mutex<>>
design - Memory usage scales linearly with codebase size
- No external dependencies required at runtime
Performance Considerations
- Scanning parallelized across CPU cores
- Query results cached for repeated access
- Tree-sitter parsers reused to avoid recreation overhead
- Gitignore support to skip irrelevant files
Error Handling
Uses comprehensive error types from core::errors::LoreGrepError
:
- IO errors (file access, permissions)
- Parse errors (malformed code)
- Configuration errors
- API errors (for AI integration)
License
This project is dual-licensed under either:
- MIT License (LICENSE-MIT)
- Apache License 2.0 (LICENSE-APACHE)
Roadmap
Language Support
- Python Analyzer: Full Python support with functions, classes, imports, and method calls
- TypeScript/JavaScript Analyzers: Support for modern JS/TS features including interfaces, types, and ES6+ syntax
- Go Analyzer: Package declarations, interfaces, and Go-specific function signatures
Advanced Analysis Features
- Call Graph Analysis: Function call extraction and visualization across files
- Dependency Tracking: Advanced import/export analysis and impact assessment
- Incremental Updates: Smart re-indexing when files change to avoid full rescans
Performance & Optimization
- Memory Optimization: Improved handling of large repositories with better memory management
- Query Performance: Enhanced caching and lookup optimization for faster results
- Database Persistence: Optional disk-based storage for very large codebases
Integration & Architecture
- MCP Server Integration: Standard Model Context Protocol interface for tool calling
- Editor Integrations: VS Code, IntelliJ, and other popular editor plugins
- API Enhancements: Additional tools and query capabilities for LLM integration
Note for Contributors: This project prioritizes the library API over CLI functionality. The CLI exists primarily for development and testing. Focus contributions on the core indexing and analysis capabilities that enable AI coding assistants.