Crate turboprop

Expand description

§TurboProp - Fast Semantic Code Search and Indexing

TurboProp is a Rust library and CLI tool that enables fast semantic search across codebases using machine learning embeddings. It indexes your code files and allows you to search for functionality using natural language queries.

Now includes MCP server support for real-time integration with coding agents.

§Features

Semantic Search: Find code by meaning, not just keywords
Git Integration: Automatically respects .gitignore and only indexes tracked files
Watch Mode: Monitor file changes and automatically update the index
File Filtering: Filter by file type, size, and custom patterns
Multiple Output Formats: JSON for tools, human-readable text for reading
Performance Optimized: Handles codebases with 50-10,000+ files efficiently
Configurable Models: Use any HuggingFace sentence-transformer model
MCP Server: Real-time integration with coding agents via Model Context Protocol

§Quick Start

§CLI Usage

# Index your codebase
tp index --repo . --max-filesize 2mb

# Search for code
tp search "jwt authentication" --repo .

# Filter by file type
tp search --filetype .js "error handling" --repo .

# Get human-readable output
tp search "database queries" --repo . --output text

§Library Usage

The library provides both high-level convenience functions and low-level components for building custom search solutions.

§Basic Indexing and Search

use turboprop::{config::TurboPropConfig, build_persistent_index, search_with_config};
use std::path::Path;

// Build an index with default settings
let config = TurboPropConfig::default();
let index = build_persistent_index(Path::new("./src"), &config).await?;

// Search the index
let results = search_with_config(
    "error handling patterns",
    Path::new("./src"),
    Some(10),  // limit results
    Some(0.7)  // similarity threshold
).await?;

for result in results {
    println!("{}: {}", result.location_display(), result.content_preview(80));
}

§Custom Configuration

use turboprop::{
    config::TurboPropConfig,
    embeddings::EmbeddingConfig,
    types::FileDiscoveryConfig,
    build_persistent_index
};
use std::path::Path;

// Configure embedding model
let embedding_config = EmbeddingConfig::with_model("sentence-transformers/all-mpnet-base-v2")
    .with_batch_size(16);

// Configure file discovery
let file_config = FileDiscoveryConfig::default()
    .with_max_filesize(5_000_000)  // 5MB limit
    .with_gitignore_respect(true)
    .with_untracked(false);

// Create complete configuration
let config = TurboPropConfig {
    embedding: embedding_config,
    file_discovery: file_config,
    ..Default::default()
};

// Build index with custom configuration
let index = build_persistent_index(Path::new("./project"), &config).await?;

§Incremental Updates

use turboprop::{config::TurboPropConfig, update_persistent_index, index_exists};
use std::path::Path;

let path = Path::new("./src");
let config = TurboPropConfig::default();

if index_exists(path) {
    // Update existing index incrementally
    let (updated_index, update_result) = update_persistent_index(path, &config).await?;
     
    println!("Index updated: {} files added, {} files modified, {} files removed",
             update_result.added_files,
             update_result.updated_files,
             update_result.removed_files);
} else {
    println!("No existing index found, create one first");
}

§Architecture

TurboProp uses a multi-stage pipeline for indexing:

File Discovery: Finds files to index based on git status and filters
Content Processing: Reads and preprocesses file content
Chunking: Breaks large files into smaller, searchable chunks
Embedding Generation: Creates vector embeddings using ML models
Index Storage: Stores embeddings and metadata for fast retrieval

For searching, it:

Query Embedding: Converts the search query to a vector
Similarity Search: Finds the most similar code chunks using cosine similarity
Result Ranking: Sorts results by relevance score
Output Formatting: Presents results in the requested format

§Performance Characteristics

Indexing Speed: ~100-500 files/second (varies by file size and hardware)
Search Speed: ~10-50ms per query (after model loading)
Memory Usage: ~50-200MB (varies with model and index size)
Index Size: Typically 10-30% of source code size

§Supported Models

TurboProp supports any HuggingFace sentence-transformer model:

sentence-transformers/all-MiniLM-L6-v2 (default, 384 dims, ~90MB)
sentence-transformers/all-MiniLM-L12-v2 (384 dims, ~130MB)
sentence-transformers/all-mpnet-base-v2 (768 dims, ~420MB, highest quality)
sentence-transformers/paraphrase-MiniLM-L6-v2 (384 dims, ~90MB)

§Error Handling

All functions return anyhow::Result<T> for comprehensive error handling. Common error types include:

I/O Errors: File access, permission issues
Model Errors: Download failures, model loading issues
Configuration Errors: Invalid settings, malformed config files
Index Errors: Corrupted index, version mismatches

§Thread Safety

Most operations are thread-safe and designed for concurrent use:

Index building uses multiple worker threads for parallel processing
Search operations are read-only and fully concurrent
File watching runs in a separate background thread

§Module Organization

cli: Command-line interface definitions
commands: CLI command implementations
config: Configuration structures and loading
embeddings: ML embedding generation
files: File discovery and git integration
index: Core indexing and storage functionality
search: Search algorithms and result processing
types: Common data structures and utilities

Re-exports§

pub use mcp::McpServer;

Modules§

backends: Backend implementations for different embedding model types.
chunking
cli
commands: Command implementations for the TurboProp CLI.
compression: Vector compression algorithms for efficient index storage.
config: Configuration management for TurboProp.
constants: Constants used throughout the TurboProp codebase.
content
embeddings: Embedding generation module for converting text chunks to vector representations.
error: Structured error types for TurboProp application.
error_classification: Error classification and user-friendly error handling.
error_utils: Common error handling utilities for consistent error contexts across the codebase.
files
filters: Search result filtering functionality.
git
incremental: Incremental index update logic for efficient file change processing.
index: Vector index management with persistence capabilities.
mcp: Model Context Protocol (MCP) implementation for TurboProp
metrics: Performance metrics collection for embedding operations.
model_validation: Model validation utilities for command execution.
models: Model management and caching functionality.
output: Output formatting for search results.
parallel: Parallel processing utilities for high-performance file operations.
pipeline: Processing pipeline coordination for indexing operations.
progress: Progress reporting utilities for indexing operations.
query: Query processing and embedding generation for search functionality.
recovery: Index recovery and validation functionality.
retry: Retry logic with exponential backoff for handling transient failures.
search: Main similarity search engine implementation.
storage: Persistent storage operations for vector indexes.
streaming: Streaming operations for memory-efficient processing of large datasets.
types: Type definitions for TurboProp.
validation: Configuration validation module for TurboProp.
warnings: Resource usage warnings and recommendations system.
watcher: File system watching implementation for incremental index updates.

Constants§

DEFAULT_INDEX_PATH: Default path for indexing when no path is specified

Functions§

build_persistent_index: Build a persistent vector index for the specified path.
index_exists: Check if a persistent index exists at the specified path.
index_files: Index files in the specified path for fast searching.
index_files_with_config: Index files with embedding generation using the provided configuration.
load_persistent_index: Load an existing persistent vector index from disk.
search_files: Search through indexed files using the specified query.
search_with_config: Advanced search function with configurable parameters.
update_persistent_index: Update an existing persistent index incrementally based on file changes.

Crate turboprop

Crate turboprop Copy item path

§TurboProp - Fast Semantic Code Search and Indexing

§Features

§Quick Start

§CLI Usage

§Library Usage

§Basic Indexing and Search

§Custom Configuration

§Incremental Updates

§Architecture

§Performance Characteristics

§Supported Models

§Error Handling

§Thread Safety

§Module Organization

Re-exports§

Modules§

Constants§

Functions§

Crate turboprop