Crate infiniloom_engine

Crate infiniloom_engine 

Source
Expand description

§Infiniloom Engine - Repository Context Generation for LLMs

infiniloom_engine is a high-performance library for generating optimized repository context for Large Language Models. It transforms codebases into structured formats optimized for Claude, GPT-4, Gemini, and other LLMs.

§Features

  • AST-based symbol extraction via Tree-sitter (21 programming languages)
  • PageRank-based importance ranking for intelligent code prioritization
  • Model-specific output formats (XML for Claude, Markdown for GPT, YAML for Gemini)
  • Automatic secret detection and redaction (API keys, credentials, tokens)
  • Accurate token counting using tiktoken-rs for OpenAI models (~95% accuracy)
  • Full dependency resolution with transitive dependency analysis
  • Remote Git repository support (GitHub, GitLab, Bitbucket)
  • Incremental scanning with content-addressed caching
  • Semantic compression for intelligent code summarization
  • Token budget enforcement with smart truncation strategies

§Quick Start

use infiniloom_engine::{Repository, RepoMapGenerator, OutputFormatter, OutputFormat};

// Create a repository from scanned files
let repo = Repository::new("my-project", "/path/to/project");

// Generate a repository map with key symbols ranked by importance
let map = RepoMapGenerator::new(2000).generate(&repo);

// Format for Claude (XML output)
let formatter = OutputFormatter::by_format(OutputFormat::Xml);
let output = formatter.format(&repo, &map);

§Output Formats

Each LLM has an optimal input format:

FormatBest ForNotes
XMLClaudeOptimized structure, CDATA sections
MarkdownGPT-4Fenced code blocks with syntax highlighting
YAMLGeminiQuery at end (Gemini best practice)
TOONAllToken-efficient, 30-40% fewer tokens
JSONAPIsMachine-readable, fully structured

§Token Counting

The library provides accurate token counts for multiple LLM families:

use infiniloom_engine::{Tokenizer, TokenModel};

let tokenizer = Tokenizer::new();
let content = "fn main() { println!(\"Hello\"); }";

// Exact counts via tiktoken for OpenAI models
let gpt4o_tokens = tokenizer.count(content, TokenModel::Gpt4o);

// Calibrated estimation for other models
let claude_tokens = tokenizer.count(content, TokenModel::Claude);

§Security Scanning

Automatically detect and redact sensitive information:

use infiniloom_engine::SecurityScanner;

let scanner = SecurityScanner::new();
let content = "AWS_KEY=AKIAIOSFODNN7EXAMPLE";

// Check if content is safe
if !scanner.is_safe(content, "config.env") {
    // Redact sensitive content
    let redacted = scanner.redact_content(content, "config.env");
}

§Feature Flags

Enable optional functionality:

  • async - Async/await support with Tokio
  • embeddings - Character-frequency similarity (NOT neural - see semantic module docs)
  • watch - File watching for incremental updates
  • full - All features enabled

Note: Git operations use the system git CLI via std::process::Command.

§Module Overview

ModuleDescription
parserAST-based symbol extraction using Tree-sitter
repomapPageRank-based symbol importance ranking
outputModel-specific formatters (XML, Markdown, etc.)
securitySecret detection and redaction
tokenizerMulti-model token counting
chunkingSemantic code chunking
budgetToken budget enforcement
incrementalCaching and incremental scanning
semanticHeuristic-based compression (char-frequency, NOT neural)
errorUnified error types

Re-exports§

pub use chunking::Chunk;
pub use chunking::ChunkStrategy;
pub use chunking::Chunker;
pub use constants::budget as budget_constants;
pub use constants::compression as compression_constants;
pub use constants::files as file_constants;
pub use constants::index as index_constants;
pub use constants::pagerank as pagerank_constants;
pub use constants::parser as parser_constants;
pub use constants::repomap as repomap_constants;
pub use constants::security as security_constants;
pub use constants::timeouts as timeout_constants;
pub use newtypes::ByteOffset;
pub use newtypes::FileSize;
pub use newtypes::ImportanceScore;
pub use newtypes::LineNumber;
pub use newtypes::SymbolId;
pub use newtypes::TokenCount;
pub use output::OutputFormat;
pub use output::OutputFormatter;
pub use parser::detect_file_language;
pub use parser::Language;
pub use parser::Parser;
pub use parser::ParserError;
pub use ranking::count_symbol_references;
pub use ranking::rank_files;
pub use ranking::sort_files_by_importance;
pub use ranking::SymbolRanker;
pub use repomap::RepoMap;
pub use repomap::RepoMapGenerator;
pub use security::SecurityScanner;
pub use budget::BudgetConfig;
pub use budget::BudgetEnforcer;
pub use budget::EnforcementResult;
pub use budget::TruncationStrategy;
pub use config::Config;
pub use config::OutputConfig;
pub use config::PerformanceConfig;
pub use config::ScanConfig;
pub use config::SecurityConfig;
pub use config::SymbolConfig;
pub use dependencies::DependencyEdge;
pub use dependencies::DependencyGraph;
pub use dependencies::DependencyNode;
pub use dependencies::ResolvedImport;
pub use git::ChangedFile;
pub use git::Commit;
pub use git::FileStatus;
pub use git::GitError;
pub use git::GitRepo;
pub use incremental::CacheError;
pub use incremental::CacheStats;
pub use incremental::CachedFile;
pub use incremental::CachedSymbol;
pub use incremental::RepoCache;
pub use mmap_scanner::MappedFile;
pub use mmap_scanner::MmapScanner;
pub use mmap_scanner::ScanStats;
pub use mmap_scanner::ScannedFile;
pub use mmap_scanner::StreamingProcessor;
pub use remote::GitProvider;
pub use remote::RemoteError;
pub use remote::RemoteRepo;
pub use semantic::CodeChunk;
pub use semantic::HeuristicCompressionConfig;
pub use semantic::HeuristicCompressor;
pub use semantic::SemanticCompressor;
pub use semantic::SemanticConfig;
pub use semantic::SemanticError;
pub use tokenizer::TokenCounts as AccurateTokenCounts;
pub use tokenizer::Tokenizer;
pub use error::InfiniloomError;
pub use error::Result as InfiniloomResult;
pub use types::*;

Modules§

budget
Smart token budget enforcement with binary search truncation
chunking
Intelligent code chunking for LLM context windows
config
Configuration file support for Infiniloom
constants
Centralized constants for Infiniloom
default_ignores
Default ignore patterns for Infiniloom
dependencies
Full AST-based dependency resolution and import graph
error
Unified error types for Infiniloom
git
Git integration for diff/log analysis
incremental
Incremental scanning with file watching and caching
index
Git context index module.
mmap_scanner
Memory-mapped file scanner for high-performance large repository scanning
newtypes
Type-safe wrappers for primitive types
output
Output formatters for different LLM models
parser
Tree-sitter based code parser for extracting symbols from source files
ranking
Symbol importance ranking
remote
Remote repository support
repomap
Repository map generation with PageRank-based symbol ranking
scanner
Unified scanner module for repository scanning
security
Security scanning for secrets and sensitive data
semantic
Semantic analysis and compression module
tokenizer
Accurate token counting using actual BPE tokenizers
types
Core type definitions for Infiniloom

Constants§

DEFAULT_CHUNK_SIZE
Default chunk size in tokens
DEFAULT_MAP_BUDGET
Default token budget for repository maps
VERSION
Library version