synwire-index
Semantic indexing pipeline for Synwire VFS providers. Orchestrates directory walking, AST-aware chunking, embedding, vector storage, and background file watching into a single SemanticIndex entry point.
Pipeline
walk(path) → chunk_file() → embed_documents() → vector_store::add() → meta.json
- Walk: Collect files matching include/exclude globs, up to
max_file_size(default 1 MiB). - Hash check: Skip files whose xxHash128 content hash matches the stored hash — no re-embedding on unchanged files.
- Chunk: Each changed file is split into
Documents bysynwire-chunker(AST or text splitter). - Embed: Document texts are batch-embedded via
synwire-embeddings-local. - Store: Vectors are written to
synwire-vectorstore-lancedb. - Cache:
meta.jsonandhashes.jsonare written to the index cache directory.
Quick start
use ;
use Chunker;
use ;
use LanceDbVectorStore;
use Arc;
use Path;
let embeddings = new;
let reranker = Some;
let store_factory: StoreFactory = Boxnew;
let index = new;
// Start indexing (non-blocking — returns a handle)
let handle = index.index.await?;
// Poll for completion
use IndexStatus;
loop
// Search
let results = index.search.await?;
Configuration
use IndexConfig;
use PathBuf;
let config = IndexConfig ;
chunk_size and chunk_overlap apply only to the text splitter path. AST-chunked files always produce one chunk per definition.
Feature flags
| Feature | Default | Description |
|---|---|---|
hybrid-search |
No | BM25 (tantivy) + vector hybrid search |
code-graph |
No | Cross-file call/import/inherit dependency graph |
community-detection |
No | HIT-Leiden clustering over code graph |
Hybrid search (hybrid-search)
Combines BM25 lexical scoring with vector semantic scoring using a configurable alpha parameter:
score = alpha * bm25_score + (1 - alpha) * vector_score
alpha = 1.0: pure BM25 (exact keyword match)alpha = 0.0: pure vector (semantic match)alpha = 0.5: balanced hybrid (default)
use ;
let config = HybridSearchConfig ;
let results = hybrid_search.await?;
Code graph (code-graph)
Builds a cross-file dependency graph from tree-sitter ASTs. Node types: (file, symbol). Edge types: calls, imports, contains, inherits.
use ;
let xrefs = xref_query.await?;
Community detection (community-detection)
Applies HIT-Leiden clustering to the code graph to identify cohesive modules. Community state is persisted via StorageLayout::communities_dir().
File watcher
After a successful index, a background file watcher starts automatically:
- Platform-native:
inotify(Linux),FSEvents(macOS),ReadDirectoryChangesW(Windows) - Events within a 300 ms window are coalesced (debounced)
- Only files with changed content (by xxHash128) trigger re-indexing
- The watcher stops when
SemanticIndexis dropped orunwatch(path)is called
Incremental updates
Only changed files are re-processed on subsequent index() calls or watcher events. The hash table (hashes.json) tracks content hashes per file; unchanged files are skipped entirely.
See also
- synwire-chunker — chunking strategies
- synwire-embeddings-local — local embedding models
- synwire-vectorstore-lancedb — vector storage backend
- synwire-storage — cache path management