zeph 0.21.2

Lightweight AI agent with hybrid inference, skills-first architecture, and multi-channel I/O
# Code Indexing

AST-based code indexing and semantic retrieval for project-aware context. The `zeph-index` crate parses source files via tree-sitter, chunks them by AST structure, embeds the chunks in Qdrant, and retrieves relevant code via hybrid search (semantic + grep routing) for injection into the agent context window.

`zeph-index` is always-on — no feature flag is required. Enable indexing at runtime via `[index] enabled = true` in config.

## Why Code RAG

Cloud models with 200K token windows can afford multi-round agentic grep. Local models with 8K-32K windows cannot: a single grep cycle costs ~2K tokens (25% of an 8K budget), while 5 rounds would exceed the entire context. RAG retrieves 6-8 relevant chunks in ~3K tokens, preserving budget for history and response.

For cloud models, code RAG serves as pre-fill context alongside agentic search. For local models, it is the primary code retrieval mechanism.

## Setup

1. **Start Qdrant** (required for vector storage):

   ```bash
   docker compose up -d qdrant
   ```

2. **Enable indexing in config:**

   ```toml
   [index]
   enabled = true
   ```

3. **Index your project:**

   ```bash
   zeph index
   ```

   Or let auto-indexing handle it on startup when `auto_index = true` (default).

## Architecture

The `zeph-index` crate contains 7 modules:

| Module | Purpose |
|--------|---------|
| `languages` | Language detection from file extensions, tree-sitter grammar registry |
| `chunker` | AST-based chunking with greedy sibling merge (cAST-inspired algorithm) |
| `context` | Contextualized embedding text generation (file path + scope + imports + code) |
| `store` | Dual-write storage: Qdrant vectors + SQLite chunk metadata |
| `indexer` | Orchestrator: walk project tree, chunk files, embed, store with incremental change detection |
| `retriever` | Query classification, semantic search, budget-aware chunk packing |
| `repo_map` | Compact structural map of the project (signatures only, no function bodies) |

### Pipeline

```text
Source files
    |
    v
[languages.rs] detect language, load grammar
    |
    v
[chunker.rs] parse AST, split into chunks (target: ~600 non-ws chars)
    |
    v
[context.rs] prepend file path, scope chain, imports, language tag
    |
    v
[indexer.rs] embed via LlmProvider, skip unchanged (content hash)
    |
    v
[store.rs] upsert to Qdrant (vectors) + SQLite (metadata)
```

### Retrieval

```text
User query
    |
    v
[retriever.rs] classify_query()
    |
    +--> Semantic  --> embed query --> Qdrant search --> budget pack --> inject
    |
    +--> Grep      --> return empty (agent uses bash tools)
    |
    +--> Hybrid    --> semantic search + hint to agent
```

## Query Classification

The retriever classifies each query to route it to the appropriate search strategy:

| Strategy | Trigger | Action |
|----------|---------|--------|
| **Grep** | Exact symbols: `::`, `fn `, `struct `, CamelCase, snake_case identifiers | Agent handles via shell grep/ripgrep |
| **Semantic** | Conceptual queries: "how", "where", "why", "explain" | Vector similarity search in Qdrant |
| **Hybrid** | Both symbol patterns and conceptual words | Semantic search + hint that grep may also help |

Default (no pattern match): Semantic.

## AST-Based Chunking

Files are parsed via tree-sitter into AST, then chunked by entity boundaries (functions, structs, classes, impl blocks). The algorithm uses greedy sibling merge:

- **Target size:** 600 non-whitespace characters (~300-400 tokens)
- **Max size:** 1200 non-ws chars (forced recursive split)
- **Min size:** 100 non-ws chars (merge with adjacent sibling)

Config files (TOML, JSON, Markdown, Bash) are indexed as single file-level chunks since they lack named entities.

Each chunk carries rich metadata: file path, language, AST node type, entity name, line range, scope chain (e.g. `MyStruct > impl MyStruct > my_method`), imports, and a BLAKE3 content hash for change detection.

## Contextualized Embeddings

Embedding raw code alone yields poor retrieval quality for conceptual queries. Before embedding, each chunk is prepended with:

- File path (`# src/agent.rs`)
- Scope chain (`# Scope: Agent > prepare_context`)
- Language tag (`# Language: rust`)
- First 5 import/use statements

This contextualized form improves retrieval for queries like "where is auth handled?" where the code alone might not contain the word "auth".

## Storage

Chunks are dual-written to two stores:

| Store | Data | Purpose |
|-------|------|---------|
| Qdrant (`zeph_code_chunks`) | Embedding vectors + payload (code, metadata) | Semantic similarity search |
| SQLite (`chunk_metadata`) | File path, content hash, line range, language, node type | Change detection, cleanup of deleted files |

The Qdrant collection uses INT8 scalar quantization for ~4x memory reduction with minimal accuracy loss. Payload indexes on `language`, `file_path`, and `node_type` enable filtered search.

## Incremental Indexing

On subsequent runs, the indexer skips unchanged chunks by checking BLAKE3 content hashes in SQLite. Only modified or new files are re-embedded. Deleted files are detected by comparing the current file set against the SQLite index, and their chunks are removed from both stores.

## File Watcher

When `watch = true` (default), an `IndexWatcher` monitors project files for changes during the session. On file modification, the changed file is automatically re-indexed via `reindex_file()` without rebuilding the entire index. The watcher uses 500ms debounce to batch rapid changes and only processes files with indexable extensions.

Disable with:

```toml
[index]
watch = false
```

## Task Supervision

Code indexing integrates with the `TaskSupervisor` for observability of concurrent embedding operations. When `embed_concurrency > 1`, each chunk embedding is registered as a separate supervised task (`chunk_file_{N}`), making individual embedding progress visible in the TUI task registry and tracing systems.

Access the task registry via the TUI command palette:

```
Ctrl+P -> /tasks
```

This displays a live table of all supervised tasks, including:

- **Chunk embeddings**: Individual file chunks being embedded
- **Background indexers**: Automatic re-indexing of modified files
- **Refresh cycles**: Periodic re-index operations

Each task shows: name, state (Running/Waiting), uptime since last restart, and restart count. This enables fine-grained debugging of indexing performance bottlenecks.

## Repo Map

A lightweight structural map of the project generated via tree-sitter ts-query. Included in the system prompt and cached with a configurable TTL (default: 5 minutes) to avoid per-message filesystem traversal.

For each supported language, tree-sitter queries extract `SymbolInfo` records — name, kind (function, struct, class, impl, etc.), visibility (pub/private), and line number — directly from the AST. This replaces the previous heuristic regex approach and adds accurate multi-language support.

The repo map is injected unconditionally for all providers (Claude, OpenAI, Ollama, and others). Qdrant semantic retrieval remains provider-dependent and only runs when embeddings are available.

Example output:

```text
<repo_map>
  src/agent.rs :: pub struct Agent (line 12), pub fn new (line 45), pub fn run (line 78), fn prepare_context (line 110)
  src/config.rs :: pub struct Config (line 5), pub fn load (line 30)
  src/main.rs :: pub fn main (line 1), fn setup_logging (line 15)
  ... and 12 more files
</repo_map>
```

The map is budget-constrained (default: 1024 tokens) and sorted by symbol count (files with more symbols appear first). It gives the model a structural overview of the project without consuming significant context.

## LSP Hover Pre-filter

When the `lsp-context` feature is enabled, `zeph-index` pre-filters hover requests before forwarding them to the language server. Previously this filter used a Rust-only regex; it now uses tree-sitter to identify the symbol under the cursor for all supported languages (Rust, Python, JavaScript, TypeScript, Go).

The tree-sitter hover pre-filter:

1. Parses the file with the appropriate grammar.
2. Finds the AST node at the cursor position.
3. Walks up the tree to the nearest named symbol (identifier, field expression, call expression, etc.).
4. Passes the resolved symbol to the MCP LSP server for a hover lookup.

This makes hover-based context injection accurate across all indexed languages, not just Rust.

## Budget-Aware Retrieval

Retrieved chunks are packed into a token budget (default: 40% of available context for code). Chunks are sorted by similarity score and greedily packed until the budget is exhausted. A minimum score threshold (default: 0.25) filters low-relevance results.

Retrieved code is injected as a transient `<code_context>` XML block before the conversation history. It is re-generated on every turn and never persisted.

## Context Window Layout (with Code RAG)

When code indexing is enabled, the context window includes two additional sections:

```text
+---------------------------------------------------+
| System prompt + environment + ZEPH.md             |
+---------------------------------------------------+
| <repo_map> (structural overview, cached)          |  <= 1024 tokens
+---------------------------------------------------+
| <available_skills>                                |
+---------------------------------------------------+
| <code_context> (per-query RAG chunks, transient)  |  <= 30% available
+---------------------------------------------------+
| [semantic recall] past messages                   |  <= 10% available
+---------------------------------------------------+
| Recent message history                            |  <= 50% available
+---------------------------------------------------+
| [response reserve]                                |  20% of total
+---------------------------------------------------+
```

## Configuration

```toml
[index]
# Enable codebase indexing for semantic code search.
# Requires Qdrant running (uses separate collection "zeph_code_chunks").
enabled = false

# Auto-index on startup and re-index changed files during session.
auto_index = true

# Directories to index (relative to cwd).
paths = ["."]

# Patterns to exclude (in addition to .gitignore).
exclude = ["target", "node_modules", ".git", "vendor", "dist", "build", "__pycache__"]

# Token budget for repo map in system prompt (0 = no repo map).
repo_map_budget = 1024

# Cache TTL for repo map in seconds (avoids per-message regeneration).
repo_map_ttl_secs = 300

[index.chunker]
# Target chunk size in non-whitespace characters (~300-400 tokens).
target_size = 600
# Maximum chunk size before forced split.
max_size = 1200
# Minimum chunk size — smaller chunks merge with siblings.
min_size = 100

[index.retrieval]
# Maximum chunks to fetch from Qdrant (before budget packing).
max_chunks = 12
# Minimum cosine similarity score to accept.
score_threshold = 0.25
# Maximum fraction of available context budget for code chunks.
budget_ratio = 0.40
```

## Automatic Code RAG Injection

When `[index]` is enabled with a `Qdrant` backend available and `mcp_enabled = false`, code context is automatically injected at context-assembly time. The retriever queries the code chunk collection using the current user message as the retrieval key, fetches the top-scoring chunks up to `budget_ratio` of the available context window, and appends them to the prompt as a `<code_context>` block.

**Activation conditions:**

- `[index] enabled = true`
- `[index.retrieval] budget_ratio > 0`
- Qdrant is available and accessible
- MCP tool exposure is disabled (`mcp_enabled = false`; when both are enabled, MCP tools take priority to avoid duplication)

**Example context injection:**

When you write "implement a cache invalidation function", the agent's context assembly:

1. Embeds "implement a cache invalidation function" using the configured embedding model
2. Queries Qdrant's `zeph_code_chunks` collection for semantically relevant code
3. Fetches up to `max_chunks = 12` results with `score_threshold >= 0.25`
4. Packs chunks into a `<code_context>` block (up to 40% of available tokens)
5. Injects the block into the prompt

The retrieval is fail-open: if embedding, Qdrant queries, or scoring errors occur, the injection is silently skipped and the turn continues. No special tooling is required from the agent.

Use `budget_ratio = 0` to disable automatic injection while keeping the code index available for manual MCP tool queries via `symbol_definition`, `find_text_references`, etc.

## Supported Languages

All tree-sitter grammars are compiled into every build. Language sub-features on `zeph-index` (`lang-rust`, `lang-python`, `lang-js`, `lang-go`, `lang-config`) are all enabled by default and cannot be individually disabled in the standard build.

| Language | Feature | Extensions |
|----------|---------|------------|
| Rust | `lang-rust` | `.rs` |
| Python | `lang-python` | `.py`, `.pyi` |
| JavaScript | `lang-js` | `.js`, `.jsx`, `.mjs`, `.cjs` |
| TypeScript | `lang-js` | `.ts`, `.tsx`, `.mts`, `.cts` |
| Go | `lang-go` | `.go` |
| Bash | `lang-config` | `.sh`, `.bash`, `.zsh` |
| TOML | `lang-config` | `.toml` |
| JSON | `lang-config` | `.json`, `.jsonc` |
| Markdown | `lang-config` | `.md`, `.markdown` |

## Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `ZEPH_INDEX_ENABLED` | Enable code indexing | `false` |
| `ZEPH_INDEX_AUTO_INDEX` | Auto-index on startup | `true` |
| `ZEPH_INDEX_REPO_MAP_BUDGET` | Token budget for repo map | `1024` |
| `ZEPH_INDEX_REPO_MAP_TTL_SECS` | Cache TTL for repo map in seconds | `300` |

## Code Index as MCP Tools

When `index.mcp_enabled = true`, the code index is exposed as an in-process MCP server (`IndexMcpServer`) that registers four navigation tools directly into the tool executor pipeline. No JSON-RPC transport is involved — the tools run in-process alongside external MCP servers.

### Exposed Tools

| Tool | Input | Description |
|------|-------|-------------|
| `symbol_definition` | `name: String` | Returns file path and line number for all definitions of a symbol (function, struct, enum, trait, etc.) found via tree-sitter AST |
| `find_text_references` | `name: String` | Textual search for references to a symbol across all indexed files; may include false positives from comments and strings |
| `call_graph` | `fn_name: String` | Returns a heuristic call graph rooted at the given function, derived from child symbol relationships in the AST |
| `module_summary` | `path: String` | Lists all symbols (name, kind, visibility, line number) defined in a given source file |

### How This Differs from Repo Map Injection

The repo map (`repo_map_budget`) is a static overview injected once per system prompt. It lists symbol names and locations but does not answer specific queries. The MCP tools are dynamic: the LLM calls them on demand to answer precise navigation questions, similar to IDE "go to definition" or "find references". This is more token-efficient for targeted lookups and avoids injecting an entire structural overview when only one symbol matters.

| Capability | Repo Map | MCP Tools |
|-----------|----------|-----------|
| Always present in context | Yes | No (on-demand) |
| Find definition of one symbol | No | Yes (`symbol_definition`) |
| List all symbols in a file | No | Yes (`module_summary`) |
| Find all usages of a symbol | No | Yes (`find_text_references`) |
| Call chain from a function | No | Yes (`call_graph`) |

### Configuration

```toml
[index]
enabled     = true
mcp_enabled = true   # expose index as MCP tools
```

`mcp_enabled` defaults to `false`. Enabling it does not require Qdrant — the tool index is built directly from tree-sitter AST parsing and held in memory.

### When to Use

Enable `mcp_enabled` for IDE-like workflows where the LLM needs to navigate the codebase interactively: tracing a call chain, checking where a struct is defined, or listing all symbols in a module. For large codebases where a full repo map would exceed the context budget, MCP tools provide targeted lookups without the token overhead.

The two mechanisms complement each other: repo map gives the model a high-level structural overview, and MCP tools let it drill into specific locations on demand.

## Embedding Model Recommendations

The indexer uses the same `LlmProvider.embed()` as semantic memory. Any embedding model works. For code-heavy workloads:

| Model | Dims | Notes |
|-------|------|-------|
| `qwen3-embedding` | 1024 | Current Zeph default, good general performance |
| `nomic-embed-text` | 768 | Lightweight universal model |
| `nomic-embed-code` | 768 | Optimized for code, higher RAM (~7.5GB) |