synwire-index 0.1.0

Semantic index pipeline for Synwire VFS providers
Documentation

synwire-index

Semantic indexing pipeline for Synwire VFS providers. Orchestrates directory walking, AST-aware chunking, embedding, vector storage, and background file watching into a single SemanticIndex entry point.

Pipeline

walk(path) → chunk_file() → embed_documents() → vector_store::add() → meta.json
  1. Walk: Collect files matching include/exclude globs, up to max_file_size (default 1 MiB).
  2. Hash check: Skip files whose xxHash128 content hash matches the stored hash — no re-embedding on unchanged files.
  3. Chunk: Each changed file is split into Documents by synwire-chunker (AST or text splitter).
  4. Embed: Document texts are batch-embedded via synwire-embeddings-local.
  5. Store: Vectors are written to synwire-vectorstore-lancedb.
  6. Cache: meta.json and hashes.json are written to the index cache directory.

Quick start

use synwire_index::{SemanticIndex, IndexConfig, StoreFactory};
use synwire_chunker::Chunker;
use synwire_embeddings_local::{LocalEmbeddings, LocalReranker};
use synwire_vectorstore_lancedb::LanceDbVectorStore;
use std::sync::Arc;
use std::path::Path;

let embeddings = Arc::new(LocalEmbeddings::new()?);
let reranker = Some(Arc::new(LocalReranker::new()?));

let store_factory: StoreFactory = Box::new(|path: &Path| {
    let handle = tokio::runtime::Handle::current();
    handle.block_on(LanceDbVectorStore::open(
        path.join("lance").to_string_lossy(),
        "chunks",
        384,
    ))
});

let index = SemanticIndex::new(
    Chunker::new(),
    embeddings,
    reranker,
    store_factory,
    IndexConfig::default(),
    None, // optional event sender
);

// Start indexing (non-blocking — returns a handle)
let handle = index.index(Path::new("/path/to/project"), Default::default()).await?;

// Poll for completion
use synwire_index::IndexStatus;
loop {
    match index.status(&handle.index_id).await {
        IndexStatus::Ready(_) => break,
        IndexStatus::Failed(e) => return Err(e.into()),
        _ => tokio::time::sleep(std::time::Duration::from_millis(500)).await,
    }
}

// Search
let results = index.search(
    Path::new("/path/to/project"),
    "authentication logic",
    Default::default(),
).await?;

Configuration

use synwire_index::IndexConfig;
use std::path::PathBuf;

let config = IndexConfig {
    cache_base: Some(PathBuf::from(".myapp-cache")),  // default: OS cache dir
    chunk_size: 2000,    // default: 1500
    chunk_overlap: 300,  // default: 200
};

chunk_size and chunk_overlap apply only to the text splitter path. AST-chunked files always produce one chunk per definition.

Feature flags

Feature Default Description
hybrid-search No BM25 (tantivy) + vector hybrid search
code-graph No Cross-file call/import/inherit dependency graph
community-detection No HIT-Leiden clustering over code graph

Hybrid search (hybrid-search)

Combines BM25 lexical scoring with vector semantic scoring using a configurable alpha parameter:

score = alpha * bm25_score + (1 - alpha) * vector_score
  • alpha = 1.0: pure BM25 (exact keyword match)
  • alpha = 0.0: pure vector (semantic match)
  • alpha = 0.5: balanced hybrid (default)
#[cfg(feature = "hybrid-search")]
use synwire_index::{HybridSearchConfig, hybrid_search};

let config = HybridSearchConfig { alpha: 0.5, top_k: 10 };
let results = hybrid_search(&bm25_index, &vector_store, &embeddings, "auth", config).await?;

Code graph (code-graph)

Builds a cross-file dependency graph from tree-sitter ASTs. Node types: (file, symbol). Edge types: calls, imports, contains, inherits.

#[cfg(feature = "code-graph")]
use synwire_index::{XrefGraph, xref_query, XrefDirection};

let xrefs = xref_query(&graph, "MyStruct::authenticate", 2, XrefDirection::Incoming).await?;

Community detection (community-detection)

Applies HIT-Leiden clustering to the code graph to identify cohesive modules. Community state is persisted via StorageLayout::communities_dir().

File watcher

After a successful index, a background file watcher starts automatically:

  • Platform-native: inotify (Linux), FSEvents (macOS), ReadDirectoryChangesW (Windows)
  • Events within a 300 ms window are coalesced (debounced)
  • Only files with changed content (by xxHash128) trigger re-indexing
  • The watcher stops when SemanticIndex is dropped or unwatch(path) is called

Incremental updates

Only changed files are re-processed on subsequent index() calls or watcher events. The hash table (hashes.json) tracks content hashes per file; unchanged files are skipped entirely.

See also