TurboProp
TurboProp (tp) is a fast semantic code search and indexing tool written in Rust. It uses machine learning embeddings to enable intelligent code search across your codebase, making it easy to find relevant code snippets based on natural language queries.
Key Features
- Semantic Search: Find code by meaning, not just keywords
- Git Integration: Respects
.gitignoreand only indexes files under source control - Watch Mode: Automatically updates the index when files change
- File Type Filtering: Search within specific file types
- Multiple Output Formats: JSON for tools, human-readable text for reading
- Performance Optimized: Handles codebases from 50 to 10,000+ files
- Easy Configuration: Optional
.turboprop.ymlconfiguration file - MCP Server Integration: Built-in MCP server for coding agents like Claude Code, Cursor, and Windsurf
MCP Server for Coding Agents
What is MCP? MCP (Model Context Protocol) is a standard way for AI coding agents to access external tools. Think of it as a bridge that lets your AI assistant search through your code in real-time.
Before MCP: "Find JWT authentication code" → Agent can only see files you've shared
With MCP: "Find JWT authentication code" → Agent searches your entire codebase semantically
TurboProp's MCP server works like a librarian for your codebase - it catalogs all your code, keeps it up-to-date, and helps agents find relevant code instantly.
Quick Start (< 2 minutes)
-
Start the MCP server:
-
Configure your coding agent (see integration examples below)
-
Ask your agent: "Find the JWT authentication implementation"
That's it! Your agent can now search your entire codebase semantically.
Agent Integration
Claude Code - Add to .claude.json in your project:
Cursor - Add to .cursor/mcp.json in your project:
Other Agents (GitHub Copilot, Windsurf, etc.) - Use these settings:
- Command:
tp - Arguments:
["mcp", "--repo", "."]
✓ Verify Setup: Restart your agent and ask: "Search for error handling code"
What You Can Ask Your Agent
Once configured, you can ask natural language questions like:
- "Find the JWT authentication implementation" - Locates authentication code
- "Show me error handling patterns" - Finds error handling across the codebase
- "Where is database connection logic?" - Discovers database-related code
- "Find all tests for user login" - Locates relevant test files
- "How does the API rate limiting work?" - Finds rate limiting implementation
Advanced Search Options
Your agent can also use these parameters to refine searches:
limit: Maximum results (default: 10)filetype: Filter by extension (.rs,.js,.py)filter: Glob pattern (src/**/*.rs,tests/**)threshold: Similarity threshold (0.0-1.0)
Example: "Find authentication code, limit to 5 results, only in Rust files"
Configuration & Advanced Usage
Custom Model & Settings:
Project Configuration (.turboprop.yml):
model: "sentence-transformers/all-MiniLM-L6-v2"
max_filesize: "2mb"
similarity_threshold: 0.3
📖 Complete Guide: MCP User Guide
🔧 Troubleshooting: Common Issues & Solutions
⚡ Performance: Tips for large repositories and team usage
Quick Start
Installation
Via Cargo (Recommended)
From Source
# Binary will be in target/release/tp
Basic Usage
-
Index your codebase:
-
Search for code:
-
Filter by file type:
-
Get human-readable output:
Model Support
TurboProp now supports multiple embedding models to optimize for different use cases:
Available Models
Sentence Transformer Models (FastEmbed)
-
sentence-transformers/all-MiniLM-L6-v2(default)- Fast and lightweight, good for general use
- 384 dimensions, ~23MB
- Automatic download and caching
-
sentence-transformers/all-MiniLM-L12-v2- Better accuracy with slightly more compute
- 384 dimensions, ~44MB
Specialized Code Models
nomic-embed-code.Q5_K_S.gguf- Specialized for code search and retrieval
- 768 dimensions, ~2.5GB
- Supports multiple programming languages
- Quantized for efficient inference
Multilingual Models
Qwen/Qwen3-Embedding-0.6B- State-of-the-art multilingual support (100+ languages)
- 1024 dimensions, ~600MB
- Supports instruction-based embeddings
- Excellent for code and text retrieval
Model Selection Guide
Choose your model based on your use case:
| Use Case | Recommended Model | Why |
|---|---|---|
| General code search | sentence-transformers/all-MiniLM-L6-v2 |
Fast, reliable, good balance |
| Specialized code search | nomic-embed-code.Q5_K_S.gguf |
Optimized for code understanding |
| Multilingual projects | Qwen/Qwen3-Embedding-0.6B |
Best multilingual support |
| Low resource environments | sentence-transformers/all-MiniLM-L6-v2 |
Smallest memory footprint |
| Maximum accuracy | Qwen/Qwen3-Embedding-0.6B |
State-of-the-art performance |
Usage Examples
Basic Model Selection
# List available models
# Get model information
# Download a model before use
Indexing with Different Models
# Use default model
# Use specialized code model
# Use multilingual model with instruction
Searching with Model Consistency
# Search using the same model used for indexing
# Use instruction for context-aware search (Qwen3 only)
Configuration File Support
Create .turboprop.yml in your project root:
# Default model for all operations
default_model: "sentence-transformers/all-MiniLM-L6-v2"
# Model-specific configurations
models:
"Qwen/Qwen3-Embedding-0.6B":
instruction: "Represent this code for semantic search"
cache_dir: "~/.turboprop/qwen3-cache"
"nomic-embed-code.Q5_K_S.gguf":
cache_dir: "~/.turboprop/nomic-cache"
# Performance settings
embedding:
batch_size: 32
cache_embeddings: true
# Resource limits
max_memory_usage: "8GB"
warn_large_models: true
Complete Usage Guide
Indexing Command
The index command creates a searchable index of your codebase:
Options:
--repo <PATH>: Repository path to index (default: current directory)--max-filesize <SIZE>: Maximum file size to index (e.g., "2mb", "500kb", "1gb")--watch: Monitor file changes and update index automatically--model <MODEL>: Embedding model to use (default: "sentence-transformers/all-MiniLM-L6-v2")--cache-dir <DIR>: Cache directory for models and data--worker-threads <N>: Number of worker threads for processing--batch-size <N>: Batch size for embedding generation (default: 32)--verbose: Enable verbose output
Examples:
# Basic indexing
# Index with size limit and watch mode
# Use custom model and cache directory
# Index with custom performance settings
Search Command
The search command finds relevant code using semantic similarity:
Options:
<QUERY>: Search query (natural language or keywords)--repo <PATH>: Repository path to search in (default: current directory)--limit <N>: Maximum number of results to return (default: 10)--threshold <FLOAT>: Minimum similarity threshold (0.0 to 1.0)--output <FORMAT>: Output format: 'json' (default) or 'text'--filetype <EXT>: Filter results by file extension (e.g., '.rs', '.js', '.py')--filter <PATTERN>: Filter results by glob pattern (e.g., '.rs', 'src/**/.js')
Examples:
# Basic search
# Search with filters and limits
# Get human-readable output
# High-precision search
# Search in specific directory
# Filter by glob pattern
# Recursive glob patterns
# Combine filters
Glob Pattern Filtering
TurboProp supports powerful glob pattern filtering to search within specific files or directories. Glob patterns use Unix shell-style wildcards to match file paths.
Basic Wildcards
| Wildcard | Description | Example |
|---|---|---|
* |
Match any characters within a directory | *.rs matches all Rust files |
? |
Match exactly one character | file?.rs matches file1.rs, fileA.rs |
** |
Match any characters across directories | **/*.js matches JS files anywhere |
[abc] |
Match any character in the set | file[123].rs matches file1.rs, file2.rs, file3.rs |
[!abc] |
Match any character NOT in the set | file[!0-9].rs matches filea.rs but not file1.rs |
{a,b} |
Match any of the alternatives | *.{js,ts} matches both .js and .ts files |
Common Pattern Examples
File Type Filtering
# All Rust files anywhere in the codebase
# All JavaScript and TypeScript files
# All configuration files
Directory-Specific Filtering
# Files only in the src directory
# Files only in tests directory
# Files in specific subdirectories
Recursive Directory Filtering
# Python files anywhere in the project
# Test files in any subdirectory
# Source files in src and all subdirectories
# Handler files in nested API directories
Advanced Pattern Examples
# Test files with specific naming patterns
# Source files excluding certain directories
# Files in multiple specific directories
# Files with numeric suffixes
Pattern Behavior
Path Matching: Patterns match against the entire file path, not just the filename:
*.rsmatchesmain.rs,src/main.rs, andlib/nested/file.rssrc/*.rsmatchessrc/main.rsbut notsrc/nested/file.rssrc/**/*.rsmatches bothsrc/main.rsandsrc/nested/file.rs
Case Sensitivity: Patterns are case-sensitive by default:
*.RSmatchesFILE.RSbut notfile.rs*.rsmatchesfile.rsbut notFILE.RS
Path Separators: Always use forward slashes (/) in patterns:
- ✅
src/api/*.js(correct) - ❌
src\\api\\*.js(incorrect)
Combining with File Type Filter: You can use both --filter and --filetype together:
# Search for Rust files in src directory only
Performance Tips
- Simple patterns are faster:
*.rsis faster than**/*.rs - Be specific when possible:
src/*.jsis faster than**/*.jsif you know files are insrc/ - Avoid excessive wildcards: Patterns with many
**can be slower on large codebases - Use file type filter for extensions:
--filetype .rsis optimized compared to--filter "*.rs"
Troubleshooting Glob Patterns
Pattern doesn't match expected files:
- Check case sensitivity:
*.RSvs*.rs - Verify path structure:
src/*.jsonly matches direct children ofsrc/ - Use
**for recursive matching:src/**/*.jsmatches nested files
Pattern matching too many files:
- Be more specific: use
src/*.jsinstead of*.js - Add more path components:
src/components/*.jsx - Use character classes:
test_[0-9]*.rsinstead oftest_*.rs
Complex patterns not working:
- Test simpler patterns first: start with
*.extthen add complexity - Check for typos in braces:
{js,ts}not{js, ts}(no spaces) - Validate bracket expressions:
[a-z]not[a-Z]
For more pattern examples and troubleshooting, see the TROUBLESHOOTING.md file.
Configuration
TurboProp supports optional configuration via a .turboprop.yml file in your repository root:
# .turboprop.yml
max_filesize: "2mb"
model: "sentence-transformers/all-MiniLM-L6-v2"
cache_dir: "~/.turboprop-cache"
worker_threads: 4
batch_size: 32
default_output: "json"
similarity_threshold: 0.3
Output Formats
JSON Output (Default)
Text Output
Score: 0.82 | src/auth.rs
fn authenticate_user(token: &str) -> Result<User, AuthError> {
// JWT token validation logic
...
}
Performance Characteristics
- Indexing Speed: ~100-500 files/second (depending on file size and hardware)
- Search Speed: ~10-50ms per query (after initial model loading)
- Memory Usage: ~50-200MB (varies with model and index size)
- Storage: Index size is typically 10-30% of source code size
Recommended Limits
- File Count: Up to 10,000 files (tested)
- File Size: Up to 2MB per file (configurable)
- Total Codebase: Up to 500MB of source code
Supported File Types
TurboProp works with any text-based file but is optimized for common programming languages:
- Web:
.js,.ts,.jsx,.tsx,.html,.css,.scss,.vue - Backend:
.py,.rs,.go,.java,.kt,.scala,.rb,.php - Systems:
.c,.cpp,.h,.hpp,.cs,.swift - Data:
.sql,.json,.yaml,.yml,.xml,.toml - Docs:
.md,.txt,.rst - Config:
.env,.ini,.conf,.cfg
Integration Examples
With Git Hooks
Add to .git/hooks/post-commit:
#!/bin/bash
With IDEs
Many IDEs can be configured to run external tools. Add TurboProp as a custom search tool.
With CI/CD
# In your CI script
Troubleshooting
Common Issues
Index not found
Solution: Run tp index --repo . first to create an index.
Model download fails
Solution: Check internet connection or specify a local cache directory with --cache-dir.
Large files skipped
)
Solution: Increase limit with --max-filesize 5mb or exclude large files.
Out of memory
Solution: Reduce --batch-size or --worker-threads, or exclude large files.
Getting Help
Development
Building from Source
Running Tests
Dependencies
- clap: CLI parsing and help generation
- tokio: Async runtime for I/O operations
- serde: JSON serialization
- fastembed: Machine learning embeddings
- git2: Git repository integration
- notify: File system watching
- walkdir: Directory traversal
See Also
For more detailed information:
- Installation Guide - Comprehensive installation instructions for all platforms
- Model Documentation - Complete guide to available embedding models and selection criteria
- Configuration Guide - Advanced configuration options and
.turboprop.ymlsetup - API Reference - Library API documentation for programmatic usage
- Troubleshooting Guide - Solutions to common issues and performance problems
- Migration Guide - Upgrading from previous versions
Contributing
- Fork the repository
- Create a feature branch
- Add tests for your changes
- Ensure all tests pass:
cargo test - Submit a pull request
License
Licensed under either of:
- MIT License (LICENSE-MIT)
- Apache License, Version 2.0 (LICENSE-APACHE)
at your option.