loregrep 0.3.2

Repository indexing library for AI coding assistants. Tree-sitter parsing, fast in-memory indexing, and tool APIs for LLM integration.
Documentation

Loregrep

Crates.io PyPI License: MIT OR Apache-2.0 Build Status

Fast Repository Indexing Library for Coding Assistants

Loregrep is a Rust library with Python bindings that parses codebases into fast, searchable in-memory indexes. It's designed to provide coding assistants and AI tools with structured access to code functions, structures, dependencies, and call graphs.

Table of Contents

Installation

Rust Users

cargo add loregrep

Python Users

pip install loregrep

System Requirements

  • Rust: 1.70 or later
  • Python: 3.8 or later
  • Memory: 100MB+ available RAM (scales with repository size)
  • OS: Linux, macOS, Windows

Quick Start

30-Second Example

First, clone the loregrep repository to try the examples:

git clone https://github.com/yourusername/loregrep.git
cd loregrep

Rust:

use loregrep::LoreGrep;
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create and scan the current repository
    let mut loregrep = LoreGrep::builder().build()?;
    let scan_result = loregrep.scan(".").await?;
    
    println!("πŸ“ Scanned loregrep repository");
    println!("   {} files analyzed, {} functions found", 
             scan_result.files_scanned, scan_result.functions_found);
    
    // Search for functions containing "main"
    let result = loregrep.execute_tool("search_functions", json!({
        "pattern": "main",
        "limit": 5
    })).await?;
    
    println!("πŸ” Found functions:\n{}", result.content);
    Ok(())
}

Python:

import asyncio
import loregrep
import os

async def main():
    # Create and scan the current repository
    loregrep_instance = loregrep.LoreGrep.builder().build()
    scan_result = await loregrep_instance.scan(".")
    
    repo_name = os.path.basename(os.getcwd())
    print(f"πŸ“ Scanned {repo_name} repository")
    print(f"   {scan_result.files_scanned} files analyzed, "
          f"{scan_result.functions_found} functions found")
    
    # Search for functions containing "main"
    result = await loregrep_instance.execute_tool("search_functions", {
        "pattern": "main",
        "limit": 5
    })
    
    print(f"πŸ” Found functions:\n{result.content}")

asyncio.run(main())

Expected Output (when run in loregrep repository):

πŸ“ Scanned loregrep repository
   42 files analyzed, 156 functions found
πŸ” Found functions:
main (src/main.rs:15) - Entry point for CLI application
cli_main (src/cli_main.rs:8) - CLI entry point wrapper
async_main (src/internal/ai/conversation.rs:45) - Main conversation loop
...

Try Your Own Repository

Once you understand the basics, scan your own projects:

# Scan any repository
scan_result = await loregrep_instance.scan("/path/to/your/project")
print(f"πŸ“ Found {scan_result.functions_found} functions in {scan_result.files_scanned} files")

What It Does

  • Parses code files using tree-sitter for accurate syntax analysis
  • Indexes functions, structs, imports, exports, and relationships in memory
  • Provides 6 standardized tools that coding assistants can call to query the codebase
  • Enables AI systems to understand code structure without re-parsing

What It's NOT

  • ❌ Not an AI tool itself (provides data TO AI systems)
  • ❌ Not a traditional code analysis tool (no linting, metrics, complexity analysis)
  • ❌ Not a replacement for LSP servers (different use case)

Use Cases

Build Your Own Code Assistant

Create AI-powered tools that understand your codebase:

# Your custom coding assistant
assistant = MyCodingAssistant(loregrep_instance)
answer = await assistant.ask("How do I use the Config struct?")
# Assistant can now search functions, analyze files, and understand relationships

Enhance Existing AI Tools

Add structured code understanding to chatbots and AI applications:

  • Before: AI sees code as plain text
  • After: AI understands functions, classes, imports, and relationships

Smart Code Search

Build advanced code search beyond text matching:

# Find all functions that call a specific function
callers = await loregrep.execute_tool("find_callers", {
    "function_name": "parse_config",
    "include_context": True
})

Repository Analysis

Understand codebase structure and complexity:

# Get complete repository overview
tree = await loregrep.execute_tool("get_repository_tree", {
    "include_file_details": True,
    "show_complexity": True
})

Language Support

Language Status Functions Structs/Classes Imports Calls
Rust βœ… Full βœ… βœ… βœ… βœ…
Python πŸ“‹ Planned πŸ“‹ πŸ“‹ πŸ“‹ πŸ“‹
TypeScript πŸ“‹ Planned πŸ“‹ πŸ“‹ πŸ“‹ πŸ“‹
JavaScript πŸ“‹ Planned πŸ“‹ πŸ“‹ πŸ“‹ πŸ“‹
Go πŸ“‹ Future πŸ“‹ πŸ“‹ πŸ“‹ πŸ“‹

Want to contribute a language analyzer? See Contributing

API Examples

Rust API

use loregrep::{LoreGrep, ToolSchema};
use serde_json::json;

// Configure with builder pattern
let mut loregrep = LoreGrep::builder()
    .max_file_size(1024 * 1024)     // 1MB max file size
    .max_depth(10)                  // Maximum 10 directory levels
    .file_patterns(vec!["*.rs", "*.py"])  // Include only these files
    .exclude_patterns(vec!["target/", "node_modules/"])  // Skip these dirs
    .respect_gitignore(true)        // Honor .gitignore
    .build()?;

// Scan repository (use "." for current directory)
let scan_result = loregrep.scan(".").await?;
println!("Found {} functions in {} files", 
         scan_result.functions_found, scan_result.files_scanned);

// Get available tools for LLM integration
let tools: Vec<ToolSchema> = LoreGrep::get_tool_definitions();

// Execute tools
let functions = loregrep.execute_tool("search_functions", json!({
    "pattern": "parse.*config",
    "limit": 10,
    "include_private": false
})).await?;

let file_analysis = loregrep.execute_tool("analyze_file", json!({
    "file_path": "src/config.rs",  // Adjust path to match your project
    "include_source": true,
    "show_complexity": true
})).await?;

Python API

import asyncio
import loregrep
import json

async def main():
    # Configure with builder pattern
    loregrep_instance = (loregrep.LoreGrep.builder()
                        .max_file_size(1024 * 1024)
                        .max_depth(10)
                        .file_patterns(["*.py", "*.rs", "*.js"])
                        .exclude_patterns(["__pycache__/", "node_modules/"])
                        .respect_gitignore(True)
                        .build())

    # Scan repository (use "." for current directory)
    scan_result = await loregrep_instance.scan(".")
    print(f"Found {scan_result.functions_found} functions in {scan_result.files_scanned} files")

    # Get available tools
    tools = loregrep.LoreGrep.get_tool_definitions()
    for tool in tools:
        print(f"Tool: {tool.name} - {tool.description}")

    # Execute tools
    functions = await loregrep_instance.execute_tool("search_functions", {
        "pattern": "parse.*config",
        "limit": 10,
        "include_private": False
    })

    file_analysis = await loregrep_instance.execute_tool("analyze_file", {
        "file_path": "src/config.py",  # Adjust path to match your project
        "include_source": True,
        "show_complexity": True
    })

asyncio.run(main())

Available Tools

Loregrep provides 6 standardized tools designed for LLM integration:

1. search_functions

Find functions by name or pattern across the codebase.

Input:

{
    "pattern": "parse.*config",
    "limit": 10,
    "include_private": false,
    "file_pattern": "*.rs"
}

Output:

{
    "functions": [
        {
            "name": "parse_config",
            "file_path": "src/config.rs",
            "line_number": 45,
            "signature": "pub fn parse_config(path: &str) -> Result<Config>",
            "visibility": "public",
            "complexity": 3
        }
    ],
    "total_found": 1
}

Use Case: Find entry points, locate specific functionality, discover API patterns.

2. search_structs

Find structures, classes, and types by name or pattern.

Input:

{
    "pattern": "Config",
    "limit": 5,
    "include_fields": true
}

Output:

{
    "structs": [
        {
            "name": "Config",
            "file_path": "src/config.rs",
            "line_number": 12,
            "fields": ["name: String", "port: u16", "debug: bool"],
            "visibility": "public"
        }
    ]
}

Use Case: Understand data structures, find models, discover type definitions.

3. analyze_file

Get comprehensive analysis of a specific file.

Input:

{
    "file_path": "src/config.rs",
    "include_source": true,
    "show_complexity": true
}

Output:

{
    "file_path": "src/config.rs",
    "language": "rust",
    "functions": [...],
    "structs": [...],
    "imports": [...],
    "exports": [...],
    "complexity_score": 15,
    "source_code": "// Full source if requested"
}

Use Case: Deep dive into specific files, understand file structure, code review.

4. get_dependencies

Find imports, exports, and dependencies for a file.

Input:

{
    "file_path": "src/main.rs",
    "include_external": true,
    "show_usage": true
}

Output:

{
    "imports": [
        {"name": "Config", "from": "./config", "usage_count": 3},
        {"name": "std::fs", "from": "std", "usage_count": 1}
    ],
    "exports": [
        {"name": "main", "type": "function"}
    ]
}

Use Case: Understand module relationships, track dependencies, refactoring impact.

5. find_callers

Find where specific functions are called.

Input:

{
    "function_name": "parse_config",
    "include_context": true,
    "limit": 20
}

Output:

{
    "callers": [
        {
            "caller_function": "main",
            "file_path": "src/main.rs", 
            "line_number": 23,
            "context": "let config = parse_config(&args.config_path)?;"
        }
    ]
}

Use Case: Impact analysis, understand function usage, refactoring safety.

6. get_repository_tree

Get repository structure and overview.

Input:

{
    "include_file_details": true,
    "max_depth": 3,
    "show_stats": true
}

Output:

{
    "structure": {
        "src/": {
            "main.rs": {"functions": 1, "lines": 45},
            "config.rs": {"functions": 3, "lines": 120}
        }
    },
    "stats": {
        "total_files": 15,
        "total_functions": 67,
        "languages": ["rust"]
    }
}

Use Case: Repository overview, architecture understanding, documentation generation.

Performance

Benchmarks vs Alternatives

Repository Size Loregrep ripgrep ast-grep Advantage
Small (100 files) 0.8s 0.2s 1.2s 4x faster than ast-grep
Medium (1,000 files) 3.2s 1.1s 8.5s 2.6x faster than ast-grep
Large (10,000 files) 28s 8s 95s 3.4x faster than ast-grep

Benchmarks run on MacBook Pro M1, 16GB RAM. Times include parsing + indexing.

Note: ripgrep is faster for simple text search, but loregrep provides structured analysis that ripgrep cannot do.

Memory Usage

Repository Size Peak Memory Steady State Index Size
Small (100 files) 45MB 12MB 0.8MB
Medium (1,000 files) 180MB 65MB 8MB
Large (10,000 files) 850MB 320MB 85MB

Performance Tips

// For large repositories
let loregrep = LoreGrep::builder()
    .max_file_size(512 * 1024)    // Skip very large files
    .exclude_patterns(vec![       // Skip generated/vendor code
        "target/", "node_modules/", "vendor/", "*.generated.*"
    ])
    .max_depth(8)                 // Limit directory depth
    .build()?;

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Code Files    │───▢│   Tree-sitter    │───▢│   In-Memory     β”‚
β”‚  (.rs, .py,     β”‚    β”‚    Parsing       β”‚    β”‚    RepoMap      β”‚
β”‚   .ts, etc.)    β”‚    β”‚                  β”‚    β”‚    Indexes      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                          β”‚
                                                          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Coding Assistant│◀───│  6 Query Tools   │◀───│   Fast Lookups  β”‚
β”‚   (Claude, GPT, β”‚    β”‚ (search, analyze,β”‚    β”‚  (functions,    β”‚
β”‚   Cursor, etc.) β”‚    β”‚  dependencies)   β”‚    β”‚   structs, etc.)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Components

  • LoreGrep: Main API facade with builder pattern
  • RepoMap: Fast in-memory indexes with lookup optimization
  • RepositoryScanner: File discovery with gitignore support
  • Language Analyzers: Tree-sitter based parsing (currently Rust only)
  • Tool System: 6 standardized tools for AI integration

Module Structure

src/
β”œβ”€β”€ lib.rs                 # Public API exports and documentation
β”œβ”€β”€ loregrep.rs           # Main LoreGrep struct and builder
β”œβ”€β”€ core/                 # Core types and errors
β”œβ”€β”€ types/                # Data structures (FunctionSignature, etc.)
β”œβ”€β”€ analyzers/            # Language-specific parsers
β”‚   └── rust.rs          # Rust analyzer (tree-sitter based)
β”œβ”€β”€ storage/              # In-memory indexing
β”‚   └── repo_map.rs      # RepoMap with fast lookups
β”œβ”€β”€ scanner/              # File discovery
β”œβ”€β”€ internal/             # Internal CLI and AI implementation
β”‚   β”œβ”€β”€ cli.rs           # CLI application
β”‚   β”œβ”€β”€ ai/              # Anthropic client and conversation
β”‚   └── ui/              # Progress indicators and theming
└── cli_main.rs          # CLI binary entry point

Technical Notes

Memory Management:

  • Indexes built in memory for fast access
  • Thread-safe with Arc<Mutex<>> design
  • Memory usage scales linearly with codebase size
  • No external dependencies required at runtime

Error Handling: Uses comprehensive error types from core::errors::LoreGrepError:

  • IO errors (file access, permissions)
  • Parse errors (malformed code)
  • Configuration errors
  • API errors (for AI integration)

Examples

The project includes comprehensive examples for different use cases:

Rust Examples (examples/)

  • basic_scan.rs - Simple repository scanning and basic queries
  • tool_execution.rs - Complete LLM tool integration patterns
  • file_watcher.rs - File watching for automatic re-indexing
  • coding_assistant.rs - Full coding assistant implementation
  • advanced_queries.rs - Complex search and analysis patterns

Python Examples (python/examples/)

  • basic_usage.py - Python API introduction and common patterns
  • ai_integration.py - Integration with OpenAI/Anthropic APIs
  • web_server.py - REST API wrapper for web applications
  • batch_analysis.py - Processing multiple repositories

Running Examples

Rust:

cargo run --example basic_scan -- /path/to/repo
cargo run --example coding_assistant

Python:

cd python/examples
python basic_usage.py
python ai_integration.py

Development Setup

Prerequisites

  • Rust 1.70 or later
  • Python 3.8+ (for Python bindings)
  • For AI integration tests: Anthropic API key

Building from Source

git clone https://github.com/yourusername/loregrep.git
cd loregrep

# Build Rust library
cargo build --release

# Build Python package (requires maturin)
pip install maturin
maturin develop --features python

Testing

# Run all tests
cargo test

# Run specific test suites
cargo test public_api_integration  # Public API tests
cargo test cli::tests              # CLI tests
cargo test storage::tests          # Storage/indexing tests

# Run with output
cargo test -- --nocapture

# Test Python bindings
cd python && python -m pytest

Development CLI Usage

The CLI is primarily for development and testing:

# Build development binary
cargo build --bin loregrep

# Basic commands for testing
./target/debug/loregrep scan .
./target/debug/loregrep search "parse" --type function
./target/debug/loregrep analyze src/main.rs

Known Test Status

  • βœ… 60+ tests passing across core functionality
  • ⚠️ 8 pre-existing test failures in older modules (technical debt)
  • βœ… 100% pass rate on new Phase 3B+ tests

Contributing

We welcome contributions! Loregrep prioritizes the library API over CLI functionality - focus on core indexing and analysis capabilities that enable AI coding assistants.

High Priority Areas

1. Language Support (Most Needed)

Help expand beyond Rust to support more programming languages:

  • Python analyzer (src/analyzers/python.rs)

    • Functions, classes, decorators
    • Import/export analysis
    • Method calls and inheritance
  • TypeScript analyzer (src/analyzers/typescript.rs)

    • Interfaces, types, generics
    • ES6+ features, async/await
    • Module system analysis
  • JavaScript analyzer (src/analyzers/javascript.rs)

    • Functions, classes, arrow functions
    • CommonJS and ES6 modules
    • Dynamic features handling

2. Performance Improvements

  • Memory optimization for large repositories (>100k files)
  • Incremental update detection when files change
  • Query result caching improvements
  • Parallel parsing optimizations

3. Advanced Analysis Features

  • Call graph visualization and analysis
  • Dependency impact analysis
  • Cross-language project support
  • Code complexity metrics

Development Workflow

  1. Fork and Clone

    git clone https://github.com/your-username/loregrep.git
    cd loregrep
    
  2. Create Feature Branch

    git checkout -b feature/python-analyzer
    
  3. Develop and Test

    cargo build
    cargo test
    cargo clippy
    cargo fmt
    
  4. Test Integration

    # Test CLI
    cargo build --bin loregrep
    ./target/debug/loregrep scan examples/
    
    # Test public API
    cargo run --example basic_scan
    
    # Test Python bindings (if applicable)
    maturin develop --features python
    cd python && python examples/basic_usage.py
    
  5. Submit Pull Request

    • Include tests for new functionality
    • Update documentation for public API changes
    • Follow existing code style and patterns

Code Style Guidelines

  • Use rustfmt for formatting: cargo fmt
  • Use clippy for linting: cargo clippy
  • Add comprehensive tests for new functionality
  • Update documentation for any public API changes
  • Follow existing module organization patterns
  • Include examples for new features

Getting Help

  • GitHub Issues: Bug reports and feature requests
  • GitHub Discussions: Design discussions and questions
  • Discord: Real-time chat (link in issues)

Roadmap

v0.4.0 - Multi-Language Support (Q1 2024)

  • Python Analyzer: Full Python support with classes, functions, imports
  • TypeScript Analyzer: Complete TS/JS support including modern features
  • Performance: 2x improvement in scanning speed for large repositories

v0.5.0 - Advanced Analysis (Q2 2024)

  • Call Graph Analysis: Function call extraction and visualization
  • Dependency Tracking: Advanced import/export analysis
  • Incremental Updates: Smart re-indexing when files change

v1.0.0 - Production Ready (Q3 2024)

  • API Stability: Stable public API with semantic versioning
  • Memory Optimization: Improved handling of very large repositories
  • MCP Server Integration: Standard Model Context Protocol interface
  • Editor Integrations: VS Code and IntelliJ plugins

Future

  • Go Language Support: Package analysis and interface tracking
  • Database Persistence: Optional disk-based storage for massive codebases
  • Distributed Analysis: Support for monorepos and multi-service codebases

Want to help with any roadmap item? Check out our Contributing Guide!

License

This project is dual-licensed under either:

Choose the license that best fits your use case. Most users prefer MIT for simplicity.


Ready to get started? Jump to Installation or try the Quick Start example!