Expand description
§Project RAG - RAG-based Codebase Indexing and Semantic Search
A dual-purpose Rust library and MCP server for semantic code search using RAG (Retrieval-Augmented Generation).
§Overview
Project RAG combines vector embeddings with BM25 keyword search to enable semantic code search across large projects. It supports incremental indexing, git history search, and provides both a Rust library API and an MCP server for AI assistant integration.
§Architecture
- RagClient: Core library containing all functionality (embeddings, vector DB, indexing, search)
- RagMcpServer: Thin wrapper around RagClient that exposes functionality via MCP protocol
- Both library and MCP server are always built together - no feature flags needed
§Key Features
- Semantic Search: FastEmbed (all-MiniLM-L6-v2) for local embeddings
- Hybrid Search: Combines vector similarity with BM25 keyword matching (RRF)
- Dual Database Support: LanceDB (embedded, default) or Qdrant (external server)
- Smart Indexing: Auto-detects full vs incremental updates with persistent caching
- AST-Based Chunking: Tree-sitter parsing for 12 programming languages
- Git History Search: Semantic search over commit history with on-demand indexing
- Dual API: Use as a Rust library or as an MCP server for AI assistants
§Library Usage Example
use project_rag::{RagClient, IndexRequest, QueryRequest};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// Create client with default configuration
let client = RagClient::new().await?;
// Index a codebase
let index_req = IndexRequest {
path: "/path/to/codebase".to_string(),
project: Some("my-project".to_string()),
include_patterns: vec!["**/*.rs".to_string()],
exclude_patterns: vec!["**/target/**".to_string()],
max_file_size: 1_048_576,
};
let index_response = client.index_codebase(index_req).await?;
println!("Indexed {} files", index_response.files_indexed);
// Query the codebase
let query_req = QueryRequest {
query: "authentication logic".to_string(),
project: Some("my-project".to_string()),
limit: 10,
min_score: 0.7,
hybrid: true,
};
let query_response = client.query_codebase(query_req).await?;
for result in query_response.results {
println!("Found in {}: score {}", result.file_path, result.score);
}
Ok(())
}§MCP Server Usage Example
The MCP server wraps RagClient and exposes it via the MCP protocol:
use project_rag::mcp_server::RagMcpServer;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// Create server (internally creates a RagClient)
let server = RagMcpServer::new().await?;
// Serve over stdio (MCP protocol)
server.serve_stdio().await?;
Ok(())
}Or you can create a server with an existing client:
use project_rag::{RagClient, mcp_server::RagMcpServer};
use std::sync::Arc;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// Create client with custom configuration
let client = RagClient::new().await?;
// Wrap client in MCP server
let server = RagMcpServer::with_client(Arc::new(client))?;
server.serve_stdio().await?;
Ok(())
}§Modules
client: Core library client API with all functionalitymcp_server: MCP protocol server implementation that wraps the clientembedding: Embedding generation using FastEmbedvector_db: Vector database abstraction (LanceDB and Qdrant)bm25_search: BM25 keyword search using Tantivyindexer: File walking, AST parsing, and code chunkinggit: Git history walking and commit chunkingcache: Persistent hash cache for incremental updatesgit_cache: Git commit tracking cacheconfig: Configuration management with environment variable supporttypes: Request/response types with validationerror: Error types and result aliasespaths: Path normalization utilities
Re-exports§
pub use client::RagClient;pub use types::AdvancedSearchRequest;pub use types::ClearRequest;pub use types::ClearResponse;pub use types::FindDefinitionRequest;pub use types::FindDefinitionResponse;pub use types::FindReferencesRequest;pub use types::FindReferencesResponse;pub use types::GetCallGraphRequest;pub use types::GetCallGraphResponse;pub use types::GitSearchResult;pub use types::IndexRequest;pub use types::IndexResponse;pub use types::IndexingMode;pub use types::LanguageStats;pub use types::QueryRequest;pub use types::QueryResponse;pub use types::SearchGitHistoryRequest;pub use types::SearchGitHistoryResponse;pub use types::SearchResult;pub use types::StatisticsRequest;pub use types::StatisticsResponse;pub use config::Config;pub use error::RagError;
Modules§
- bm25_
search - BM25 keyword search using Tantivy for hybrid search
- cache
- Persistent hash cache for tracking file changes across restarts
- client
- Core library client for project-rag
- config
- Configuration management with environment variable overrides
- embedding
- Embedding generation using FastEmbed (all-MiniLM-L6-v2)
- error
- Error types and utilities
- git
- Git repository walking and commit extraction Git repository operations for semantic search over commit history
- git_
cache - Git commit tracking cache for incremental git history indexing
- glob_
utils - Glob pattern matching utilities for path filtering Glob pattern matching utilities for path filtering
- indexer
- File walking, code chunking, and AST parsing Code indexing, file walking, and chunking strategies
- mcp_
server - paths
- Path normalization and utility functions
- relations
- Code relationships: definitions, references, call graphs Code relationships module for definition/reference tracking and call graphs.
- types
- Request/response types with validation
- vector_
db - Vector database abstraction supporting LanceDB and Qdrant