Module hnsw

Expand description

HNSW (Hierarchical Navigable Small World) approximate-nearest-neighbor index. Pure algorithm; no SQL integration in this module.

HNSW is the industry-standard ANN algorithm for in-memory vector search: a multi-layer graph where each node lives at some randomly-assigned max layer; higher layers are sparser, layer 0 contains every node. Search starts at the entry point (the node at the current top layer), greedily descends layer-by-layer, then does a beam search at layer 0.

    layer 2:   [A] -- [E]                    sparse
                |       |
    layer 1:   [A] -- [E] -- [G] -- [J]      mid
                |  /  |  \   |  \   |
    layer 0:   [A,B,C,D,E,F,G,H,I,J,...]     dense (every node)

§What this module is responsible for

The graph (per-node, per-layer neighbor lists)
Layer assignment for new nodes (geometric distribution)
Insertion: greedy descent + beam search + neighbor pruning
Query: greedy descent + beam search at layer 0, return top-k

§What it is NOT responsible for (yet)

Storing vectors. The algorithm calls a get_vec(node_id) -> &[f32] closure to fetch the vector for any node it touches. In Phase 7d.2 that closure will read from the SQL table holding the indexed column; in tests it reads from an in-memory Vec<Vec<f32>>.
Persistence. The graph lives in HashMap<i64, Node> for now. Phase 7d.3 wires it into the cell-encoded page format.
DELETE / UPDATE. Pre-existing nodes can’t be removed today. Soft-delete + lazy rebuild is the planned approach for 7d.2/7d.3.

§Parameters (per Phase 7 plan Q2 — fixed defaults)

M = 16 — max neighbors per node at layers > 0
m_max0 = 32 (= 2·M) — max neighbors at layer 0
ef_construction = 200 — beam width during INSERT
ef_search = 50 — default beam width during query
m_l = 1/ln(M) ≈ 0.36 — layer-assignment scale

§Invariants

Every node.layers Vec has length node_max_layer + 1 for that node.
node.layers[i] contains node_ids of neighbors at layer i. Each neighbor is itself a node in nodes; symmetrical (if A → B at layer i then B → A at layer i, modulo pruning).
entry_point is Some(id) iff nodes is non-empty. The entry node has the highest max-layer of any node currently in the graph.

Structs§

HnswIndex: In-memory HNSW graph. See module docs for the model.
HnswParams: HNSW algorithm parameters. Phase 7 ships fixed defaults (Q2 in the plan); this struct is Clone + Copy so callers wanting to fork an experimental tuning can do so without touching the index itself.
Node: Per-node metadata: a list of neighbor IDs for each layer this node lives in. layers[0] is layer 0 (densest); layers[layers.len() - 1] is the highest layer this node reaches.

Enums§

DistanceMetric: Distance metric used by the HNSW index. Must match what the surrounding vec_distance_* SQL function would compute on the same pair of vectors — otherwise the index probe and the brute-force fallback would disagree on which rows are “nearest”. See src/sql/executor.rs’s vec_distance_l2 / _cosine / _dot for the canonical implementations.

Module hnsw

Module hnsw Copy item path

§What this module is responsible for

§What it is NOT responsible for (yet)

§Parameters (per Phase 7 plan Q2 — fixed defaults)

§Invariants

Structs§

Enums§

Module hnsw