Expand description
Per-project HNSW Approximate-Nearest-Neighbor index.
Wraps hnsw_rs::hnsw::Hnsw so the retrieval path can swap the O(N)
linear cosine scan for an O(log N) graph query on big projects. The
design is deliberately cautious:
- Additive — every public entry point returns
None/ an empty result on any failure. The retrieval fallback MUST always work, so this module never panics and never blocks rule writes. - Persistent, per-project — each project hash has its own HNSW
graph file under
~/.difflore/projects/{hash}/hnsw.*plus a sidecarhnsw.meta.jsonthat carries the dim + element count so we can detect a stale / wrong-dim index on reload. - Incremental upsert —
hnsw_rssupports runtime insertions, soupsert_rule_chunkscan stream new embeddings into the graph without a full rebuild. Replacements (samechunk_id, new vector) are modelled as “shadow” entries: the old internal id stays in the graph but is hidden from search results by atombstonesset. A fullbuild_from_chunksrebuild periodically cleans these out. - Dim mismatch => fallback — if the query dim doesn’t match the index dim we return an empty hit set; the caller sees this as “ANN gave nothing” and uses the linear scan.
The internal/id translation is tracked on the Rust side because
hnsw_rs’s DataId is a usize and we want to key on String
chunk ids. Both maps are serialised alongside the graph in the
sidecar meta file.
Structs§
- AnnIndex
- A project-scoped, disk-persisted HNSW index. See the module docs for the overall approach.
Functions§
- ann_
files_ for_ project - Convenience helper for the on-disk files belonging to a project index.
- get_
ann_ for_ project - Get-or-load the ANN for a project + embedding dimension. Cheap on the hot path (cache
hit); cold-path cost is whatever
load_or_emptypays (either an empty struct alloc or anhnsw_rsfile reload). Never errors.