Expand description
§TurboProp - Fast Semantic Code Search and Indexing
TurboProp is a Rust library and CLI tool that enables fast semantic search across codebases using machine learning embeddings. It indexes your code files and allows you to search for functionality using natural language queries.
Now includes MCP server support for real-time integration with coding agents.
§Features
- Semantic Search: Find code by meaning, not just keywords
- Git Integration: Automatically respects
.gitignoreand only indexes tracked files - Watch Mode: Monitor file changes and automatically update the index
- File Filtering: Filter by file type, size, and custom patterns
- Multiple Output Formats: JSON for tools, human-readable text for reading
- Performance Optimized: Handles codebases with 50-10,000+ files efficiently
- Configurable Models: Use any HuggingFace sentence-transformer model
- MCP Server: Real-time integration with coding agents via Model Context Protocol
§Quick Start
§CLI Usage
# Index your codebase
tp index --repo . --max-filesize 2mb
# Search for code
tp search "jwt authentication" --repo .
# Filter by file type
tp search --filetype .js "error handling" --repo .
# Get human-readable output
tp search "database queries" --repo . --output text§Library Usage
The library provides both high-level convenience functions and low-level components for building custom search solutions.
§Basic Indexing and Search
use turboprop::{config::TurboPropConfig, build_persistent_index, search_with_config};
use std::path::Path;
// Build an index with default settings
let config = TurboPropConfig::default();
let index = build_persistent_index(Path::new("./src"), &config).await?;
// Search the index
let results = search_with_config(
"error handling patterns",
Path::new("./src"),
Some(10), // limit results
Some(0.7) // similarity threshold
).await?;
for result in results {
println!("{}: {}", result.location_display(), result.content_preview(80));
}§Custom Configuration
use turboprop::{
config::TurboPropConfig,
embeddings::EmbeddingConfig,
types::FileDiscoveryConfig,
build_persistent_index
};
use std::path::Path;
// Configure embedding model
let embedding_config = EmbeddingConfig::with_model("sentence-transformers/all-mpnet-base-v2")
.with_batch_size(16);
// Configure file discovery
let file_config = FileDiscoveryConfig::default()
.with_max_filesize(5_000_000) // 5MB limit
.with_gitignore_respect(true)
.with_untracked(false);
// Create complete configuration
let config = TurboPropConfig {
embedding: embedding_config,
file_discovery: file_config,
..Default::default()
};
// Build index with custom configuration
let index = build_persistent_index(Path::new("./project"), &config).await?;§Incremental Updates
use turboprop::{config::TurboPropConfig, update_persistent_index, index_exists};
use std::path::Path;
let path = Path::new("./src");
let config = TurboPropConfig::default();
if index_exists(path) {
// Update existing index incrementally
let (updated_index, update_result) = update_persistent_index(path, &config).await?;
println!("Index updated: {} files added, {} files modified, {} files removed",
update_result.added_files,
update_result.updated_files,
update_result.removed_files);
} else {
println!("No existing index found, create one first");
}§Architecture
TurboProp uses a multi-stage pipeline for indexing:
- File Discovery: Finds files to index based on git status and filters
- Content Processing: Reads and preprocesses file content
- Chunking: Breaks large files into smaller, searchable chunks
- Embedding Generation: Creates vector embeddings using ML models
- Index Storage: Stores embeddings and metadata for fast retrieval
For searching, it:
- Query Embedding: Converts the search query to a vector
- Similarity Search: Finds the most similar code chunks using cosine similarity
- Result Ranking: Sorts results by relevance score
- Output Formatting: Presents results in the requested format
§Performance Characteristics
- Indexing Speed: ~100-500 files/second (varies by file size and hardware)
- Search Speed: ~10-50ms per query (after model loading)
- Memory Usage: ~50-200MB (varies with model and index size)
- Index Size: Typically 10-30% of source code size
§Supported Models
TurboProp supports any HuggingFace sentence-transformer model:
sentence-transformers/all-MiniLM-L6-v2(default, 384 dims, ~90MB)sentence-transformers/all-MiniLM-L12-v2(384 dims, ~130MB)sentence-transformers/all-mpnet-base-v2(768 dims, ~420MB, highest quality)sentence-transformers/paraphrase-MiniLM-L6-v2(384 dims, ~90MB)
§Error Handling
All functions return anyhow::Result<T> for comprehensive error handling.
Common error types include:
- I/O Errors: File access, permission issues
- Model Errors: Download failures, model loading issues
- Configuration Errors: Invalid settings, malformed config files
- Index Errors: Corrupted index, version mismatches
§Thread Safety
Most operations are thread-safe and designed for concurrent use:
- Index building uses multiple worker threads for parallel processing
- Search operations are read-only and fully concurrent
- File watching runs in a separate background thread
§Module Organization
cli: Command-line interface definitionscommands: CLI command implementationsconfig: Configuration structures and loadingembeddings: ML embedding generationfiles: File discovery and git integrationindex: Core indexing and storage functionalitysearch: Search algorithms and result processingtypes: Common data structures and utilities
Re-exports§
pub use mcp::McpServer;
Modules§
- backends
- Backend implementations for different embedding model types.
- chunking
- cli
- commands
- Command implementations for the TurboProp CLI.
- compression
- Vector compression algorithms for efficient index storage.
- config
- Configuration management for TurboProp.
- constants
- Constants used throughout the TurboProp codebase.
- content
- embeddings
- Embedding generation module for converting text chunks to vector representations.
- error
- Structured error types for TurboProp application.
- error_
classification - Error classification and user-friendly error handling.
- error_
utils - Common error handling utilities for consistent error contexts across the codebase.
- files
- filters
- Search result filtering functionality.
- git
- incremental
- Incremental index update logic for efficient file change processing.
- index
- Vector index management with persistence capabilities.
- mcp
- Model Context Protocol (MCP) implementation for TurboProp
- metrics
- Performance metrics collection for embedding operations.
- model_
validation - Model validation utilities for command execution.
- models
- Model management and caching functionality.
- output
- Output formatting for search results.
- parallel
- Parallel processing utilities for high-performance file operations.
- pipeline
- Processing pipeline coordination for indexing operations.
- progress
- Progress reporting utilities for indexing operations.
- query
- Query processing and embedding generation for search functionality.
- recovery
- Index recovery and validation functionality.
- retry
- Retry logic with exponential backoff for handling transient failures.
- search
- Main similarity search engine implementation.
- storage
- Persistent storage operations for vector indexes.
- streaming
- Streaming operations for memory-efficient processing of large datasets.
- types
- Type definitions for TurboProp.
- validation
- Configuration validation module for TurboProp.
- warnings
- Resource usage warnings and recommendations system.
- watcher
- File system watching implementation for incremental index updates.
Constants§
- DEFAULT_
INDEX_ PATH - Default path for indexing when no path is specified
Functions§
- build_
persistent_ index - Build a persistent vector index for the specified path.
- index_
exists - Check if a persistent index exists at the specified path.
- index_
files - Index files in the specified path for fast searching.
- index_
files_ with_ config - Index files with embedding generation using the provided configuration.
- load_
persistent_ index - Load an existing persistent vector index from disk.
- search_
files - Search through indexed files using the specified query.
- search_
with_ config - Advanced search function with configurable parameters.
- update_
persistent_ index - Update an existing persistent index incrementally based on file changes.