Expand description
AST-based code indexing, semantic retrieval, and repo map generation for Zeph.
§Overview
zeph-index implements the Code RAG (Retrieval-Augmented Generation) pipeline
that gives the Zeph agent grounded awareness of a local codebase. The pipeline has
three stages:
- Chunking —
chunkeruses tree-sitter to parse source files into semantically meaningful AST-level chunks (functions, structs, impl blocks, …) rather than fixed-size text windows. - Indexing —
indexerembeds every chunk via the configured LLM provider and writes the vector + rich metadata into a dual store: Qdrant for vector similarity andSQLitefor exact hash deduplication. - Retrieval —
retrieverclassifies the incoming query as semantic, grep, or hybrid, embeds the query, searches Qdrant, applies a score threshold, and packs results within a token budget.
§Additional subsystems
| Module | Purpose |
|---|---|
repo_map | Compact <repo_map> for the system prompt — file paths + symbol signatures |
mcp_server | In-process MCP server exposing symbol_definition, find_text_references, call_graph, module_summary tools |
watcher | File-system watcher that triggers incremental re-indexing on saves |
languages | Language detection and tree-sitter grammar registry |
store | Qdrant + SQLite dual-write store |
error | Unified error type IndexError |
§Quick start
use std::sync::Arc;
use zeph_index::indexer::{CodeIndexer, IndexerConfig};
use zeph_index::retriever::{CodeRetriever, RetrievalConfig};
use zeph_index::store::CodeStore;
// Build and run initial project index.
let indexer = CodeIndexer::new(store.clone(), Arc::clone(&provider), IndexerConfig::default());
let report = indexer.index_project(std::path::Path::new("."), None).await?;
println!("{} chunks indexed", report.chunks_created);
// Retrieve relevant code for a query.
let retriever = CodeRetriever::new(store, Arc::clone(&provider), RetrievalConfig::default());
let result = retriever.retrieve("how does authentication work?", 8_000).await?;
println!("{} chunks, {} tokens", result.chunks.len(), result.total_tokens);Re-exports§
pub use error::IndexError;pub use error::Result;pub use indexer::IndexProgress;pub use mcp_server::IndexMcpServer;
Modules§
- chunker
- AST-based chunking via tree-sitter with greedy sibling merge.
- context
- Contextualized embedding text generation.
- error
- Error types for
zeph-index. - indexer
- Project indexing orchestrator: walk → chunk → embed → store.
- languages
- Language detection and tree-sitter grammar registry.
- mcp_
server - In-process MCP server exposing AST-based code navigation tools.
- repo_
map - Lightweight structural map of a project (signatures only).
- retriever
- Hybrid code retrieval: query classification, semantic search, budget packing.
- store
- Qdrant collection +
SQLitemetadata for code chunks. - watcher
- File-system watcher for incremental re-indexing on save.