Expand description
HNSW (Hierarchical Navigable Small World) approximate-nearest-neighbor index. Pure algorithm; no SQL integration in this module.
HNSW is the industry-standard ANN algorithm for in-memory vector search: a multi-layer graph where each node lives at some randomly-assigned max layer; higher layers are sparser, layer 0 contains every node. Search starts at the entry point (the node at the current top layer), greedily descends layer-by-layer, then does a beam search at layer 0.
layer 2: [A] -- [E] sparse
| |
layer 1: [A] -- [E] -- [G] -- [J] mid
| / | \ | \ |
layer 0: [A,B,C,D,E,F,G,H,I,J,...] dense (every node)§What this module is responsible for
- The graph (per-node, per-layer neighbor lists)
- Layer assignment for new nodes (geometric distribution)
- Insertion: greedy descent + beam search + neighbor pruning
- Query: greedy descent + beam search at layer 0, return top-k
§What it is NOT responsible for (yet)
- Storing vectors. The algorithm calls a
get_vec(node_id) -> &[f32]closure to fetch the vector for any node it touches. In Phase 7d.2 that closure will read from the SQL table holding the indexed column; in tests it reads from an in-memoryVec<Vec<f32>>. - Persistence. The graph lives in
HashMap<i64, Node>for now. Phase 7d.3 wires it into the cell-encoded page format. - DELETE / UPDATE. Pre-existing nodes can’t be removed today. Soft-delete + lazy rebuild is the planned approach for 7d.2/7d.3.
§Parameters (per Phase 7 plan Q2 — fixed defaults)
M = 16— max neighbors per node at layers > 0m_max0 = 32(= 2·M) — max neighbors at layer 0ef_construction = 200— beam width during INSERTef_search = 50— default beam width during querym_l = 1/ln(M) ≈ 0.36— layer-assignment scale
§Invariants
- Every
node.layersVec has lengthnode_max_layer + 1for that node. node.layers[i]contains node_ids of neighbors at layer i. Each neighbor is itself a node innodes; symmetrical (if A → B at layer i then B → A at layer i, modulo pruning).entry_pointisSome(id)iffnodesis non-empty. The entry node has the highest max-layer of any node currently in the graph.
Structs§
- Hnsw
Index - In-memory HNSW graph. See module docs for the model.
- Hnsw
Params - HNSW algorithm parameters. Phase 7 ships fixed defaults (Q2 in the
plan); this struct is
Clone + Copyso callers wanting to fork an experimental tuning can do so without touching the index itself. - Node
- Per-node metadata: a list of neighbor IDs for each layer this node
lives in.
layers[0]is layer 0 (densest);layers[layers.len() - 1]is the highest layer this node reaches.
Enums§
- Distance
Metric - Distance metric used by the HNSW index. Must match what the
surrounding
vec_distance_*SQL function would compute on the same pair of vectors — otherwise the index probe and the brute-force fallback would disagree on which rows are “nearest”. Seesrc/sql/executor.rs’svec_distance_l2/_cosine/_dotfor the canonical implementations.