Skip to main content

Module ann

Module ann 

Source
Expand description

Per-project HNSW Approximate-Nearest-Neighbor index.

Wraps hnsw_rs::hnsw::Hnsw so the retrieval path can swap the O(N) linear cosine scan for an O(log N) graph query on big projects. The design is deliberately cautious:

  1. Additive — every public entry point returns None / an empty result on any failure. The retrieval fallback MUST always work, so this module never panics and never blocks rule writes.
  2. Persistent, per-project — each project hash has its own HNSW graph file under ~/.difflore/projects/{hash}/hnsw.* plus a sidecar hnsw.meta.json that carries the dim + element count so we can detect a stale / wrong-dim index on reload.
  3. Incremental upserthnsw_rs supports runtime insertions, so upsert_rule_chunks can stream new embeddings into the graph without a full rebuild. Replacements (same chunk_id, new vector) are modelled as “shadow” entries: the old internal id stays in the graph but is hidden from search results by a tombstones set. A full build_from_chunks rebuild periodically cleans these out.
  4. Dim mismatch => fallback — if the query dim doesn’t match the index dim we return an empty hit set; the caller sees this as “ANN gave nothing” and uses the linear scan.

The internal/id translation is tracked on the Rust side because hnsw_rs’s DataId is a usize and we want to key on String chunk ids. Both maps are serialised alongside the graph in the sidecar meta file.

Structs§

AnnIndex
A project-scoped, disk-persisted HNSW index. See the module docs for the overall approach.

Functions§

ann_files_for_project
Convenience helper for the on-disk files belonging to a project index.
get_ann_for_project
Get-or-load the ANN for a project + embedding dimension. Cheap on the hot path (cache hit); cold-path cost is whatever load_or_empty pays (either an empty struct alloc or an hnsw_rs file reload). Never errors.