Skip to main content

Crate zeph_index

Crate zeph_index 

Source
Expand description

AST-based code indexing, semantic retrieval, and repo map generation for Zeph.

§Overview

zeph-index implements the Code RAG (Retrieval-Augmented Generation) pipeline that gives the Zeph agent grounded awareness of a local codebase. The pipeline has three stages:

  1. Chunkingchunker uses tree-sitter to parse source files into semantically meaningful AST-level chunks (functions, structs, impl blocks, …) rather than fixed-size text windows.
  2. Indexingindexer embeds every chunk via the configured LLM provider and writes the vector + rich metadata into a dual store: Qdrant for vector similarity and SQLite for exact hash deduplication.
  3. Retrievalretriever classifies the incoming query as semantic, grep, or hybrid, embeds the query, searches Qdrant, applies a score threshold, and packs results within a token budget.

§Additional subsystems

ModulePurpose
repo_mapCompact <repo_map> for the system prompt — file paths + symbol signatures
mcp_serverIn-process MCP server exposing symbol_definition, find_text_references, call_graph, module_summary tools
watcherFile-system watcher that triggers incremental re-indexing on saves
languagesLanguage detection and tree-sitter grammar registry
storeQdrant + SQLite dual-write store
errorUnified error type IndexError

§Quick start

use std::sync::Arc;
use zeph_index::indexer::{CodeIndexer, IndexerConfig};
use zeph_index::retriever::{CodeRetriever, RetrievalConfig};
use zeph_index::store::CodeStore;

// Build and run initial project index.
let indexer = CodeIndexer::new(store.clone(), Arc::clone(&provider), IndexerConfig::default());
let report = indexer.index_project(std::path::Path::new("."), None).await?;
println!("{} chunks indexed", report.chunks_created);

// Retrieve relevant code for a query.
let retriever = CodeRetriever::new(store, Arc::clone(&provider), RetrievalConfig::default());
let result = retriever.retrieve("how does authentication work?", 8_000).await?;
println!("{} chunks, {} tokens", result.chunks.len(), result.total_tokens);

Re-exports§

pub use error::IndexError;
pub use error::Result;
pub use indexer::IndexProgress;
pub use mcp_server::IndexMcpServer;

Modules§

chunker
AST-based chunking via tree-sitter with greedy sibling merge.
context
Contextualized embedding text generation.
error
Error types for zeph-index.
indexer
Project indexing orchestrator: walk → chunk → embed → store.
languages
Language detection and tree-sitter grammar registry.
mcp_server
In-process MCP server exposing AST-based code navigation tools.
repo_map
Lightweight structural map of a project (signatures only).
retriever
Hybrid code retrieval: query classification, semantic search, budget packing.
store
Qdrant collection + SQLite metadata for code chunks.
watcher
File-system watcher for incremental re-indexing on save.