Expand description
§Infiniloom Engine - Repository Context Generation for LLMs
infiniloom_engine is a high-performance library for generating optimized
repository context for Large Language Models. It transforms codebases into
structured formats optimized for Claude, GPT-4, Gemini, and other LLMs.
§Features
- AST-based symbol extraction via Tree-sitter (21 programming languages)
- PageRank-based importance ranking for intelligent code prioritization
- Model-specific output formats (XML for Claude, Markdown for GPT, YAML for Gemini)
- Automatic secret detection and redaction (API keys, credentials, tokens)
- Accurate token counting using tiktoken-rs for OpenAI models (~95% accuracy)
- Full dependency resolution with transitive dependency analysis
- Remote Git repository support (GitHub, GitLab, Bitbucket)
- Incremental scanning with content-addressed caching
- Semantic compression for intelligent code summarization
- Token budget enforcement with smart truncation strategies
§Quick Start
ⓘ
use infiniloom_engine::{Repository, RepoMapGenerator, OutputFormatter, OutputFormat};
// Create a repository from scanned files
let repo = Repository::new("my-project", "/path/to/project");
// Generate a repository map with key symbols ranked by importance
let map = RepoMapGenerator::new(2000).generate(&repo);
// Format for Claude (XML output)
let formatter = OutputFormatter::by_format(OutputFormat::Xml);
let output = formatter.format(&repo, &map);§Output Formats
Each LLM has an optimal input format:
| Format | Best For | Notes |
|---|---|---|
| XML | Claude | Optimized structure, CDATA sections |
| Markdown | GPT-4 | Fenced code blocks with syntax highlighting |
| YAML | Gemini | Query at end (Gemini best practice) |
| TOON | All | Token-efficient, 30-40% fewer tokens |
| JSON | APIs | Machine-readable, fully structured |
§Token Counting
The library provides accurate token counts for multiple LLM families:
ⓘ
use infiniloom_engine::{Tokenizer, TokenModel};
let tokenizer = Tokenizer::new();
let content = "fn main() { println!(\"Hello\"); }";
// Exact counts via tiktoken for OpenAI models
let gpt4o_tokens = tokenizer.count(content, TokenModel::Gpt4o);
// Calibrated estimation for other models
let claude_tokens = tokenizer.count(content, TokenModel::Claude);§Security Scanning
Automatically detect and redact sensitive information:
ⓘ
use infiniloom_engine::SecurityScanner;
let scanner = SecurityScanner::new();
let content = "AWS_KEY=AKIAIOSFODNN7EXAMPLE";
// Check if content is safe
if !scanner.is_safe(content, "config.env") {
// Redact sensitive content
let redacted = scanner.redact_content(content, "config.env");
}§Feature Flags
Enable optional functionality:
async- Async/await support with Tokioembeddings- Character-frequency similarity (NOT neural - see semantic module docs)watch- File watching for incremental updatesfull- All features enabled
Note: Git operations use the system git CLI via std::process::Command.
§Module Overview
| Module | Description |
|---|---|
parser | AST-based symbol extraction using Tree-sitter |
repomap | PageRank-based symbol importance ranking |
output | Model-specific formatters (XML, Markdown, etc.) |
security | Secret detection and redaction |
tokenizer | Multi-model token counting |
chunking | Semantic code chunking |
budget | Token budget enforcement |
incremental | Caching and incremental scanning |
semantic | Heuristic-based compression (char-frequency, NOT neural) |
error | Unified error types |
Re-exports§
pub use chunking::Chunk;pub use chunking::ChunkStrategy;pub use chunking::Chunker;pub use output::OutputFormat;pub use output::OutputFormatter;pub use parser::Language;pub use parser::Parser;pub use parser::ParserError;pub use ranking::count_symbol_references;pub use ranking::rank_files;pub use ranking::sort_files_by_importance;pub use ranking::SymbolRanker;pub use repomap::RepoMap;pub use repomap::RepoMapGenerator;pub use security::SecurityScanner;pub use budget::BudgetConfig;pub use budget::BudgetEnforcer;pub use budget::EnforcementResult;pub use budget::TruncationStrategy;pub use config::Config;pub use config::OutputConfig;pub use config::PerformanceConfig;pub use config::ScanConfig;pub use config::SecurityConfig;pub use config::SymbolConfig;pub use dependencies::DependencyEdge;pub use dependencies::DependencyGraph;pub use dependencies::DependencyNode;pub use dependencies::ResolvedImport;pub use git::ChangedFile;pub use git::Commit;pub use git::FileStatus;pub use git::GitError;pub use git::GitRepo;pub use incremental::CacheError;pub use incremental::CacheStats;pub use incremental::CachedFile;pub use incremental::CachedSymbol;pub use incremental::RepoCache;pub use mmap_scanner::MappedFile;pub use mmap_scanner::MmapScanner;pub use mmap_scanner::ScanStats;pub use mmap_scanner::ScannedFile;pub use mmap_scanner::StreamingProcessor;pub use remote::GitProvider;pub use remote::RemoteError;pub use remote::RemoteRepo;pub use semantic::CodeChunk;pub use semantic::HeuristicCompressionConfig;pub use semantic::HeuristicCompressor;pub use semantic::SemanticCompressor;pub use semantic::SemanticConfig;pub use semantic::SemanticError;pub use tokenizer::TokenCounts as AccurateTokenCounts;pub use tokenizer::Tokenizer;pub use error::InfiniloomError;pub use error::Result as InfiniloomResult;pub use types::*;
Modules§
- budget
- Smart token budget enforcement with binary search truncation
- chunking
- Intelligent code chunking for LLM context windows
- config
- Configuration file support for Infiniloom
- default_
ignores - Default ignore patterns for Infiniloom
- dependencies
- Full AST-based dependency resolution and import graph
- error
- Unified error types for Infiniloom
- git
- Git integration for diff/log analysis
- incremental
- Incremental scanning with file watching and caching
- index
- Git context index module.
- mmap_
scanner - Memory-mapped file scanner for high-performance large repository scanning
- output
- Output formatters for different LLM models
- parser
- Tree-sitter based code parser for extracting symbols from source files
- ranking
- Symbol importance ranking
- remote
- Remote repository support
- repomap
- Repository map generation with PageRank-based symbol ranking
- security
- Security scanning for secrets and sensitive data
- semantic
- Semantic analysis and compression module
- tokenizer
- Accurate token counting using actual BPE tokenizers
- types
- Core type definitions for Infiniloom
Constants§
- DEFAULT_
CHUNK_ SIZE - Default chunk size in tokens
- DEFAULT_
MAP_ BUDGET - Default token budget for repository maps
- VERSION
- Library version