Skip to main content

Module hnsw

Module hnsw 

Source
Expand description

HNSW (Hierarchical Navigable Small World) approximate-nearest-neighbor index. Pure algorithm; no SQL integration in this module.

HNSW is the industry-standard ANN algorithm for in-memory vector search: a multi-layer graph where each node lives at some randomly-assigned max layer; higher layers are sparser, layer 0 contains every node. Search starts at the entry point (the node at the current top layer), greedily descends layer-by-layer, then does a beam search at layer 0.

    layer 2:   [A] -- [E]                    sparse
                |       |
    layer 1:   [A] -- [E] -- [G] -- [J]      mid
                |  /  |  \   |  \   |
    layer 0:   [A,B,C,D,E,F,G,H,I,J,...]     dense (every node)

§What this module is responsible for

  • The graph (per-node, per-layer neighbor lists)
  • Layer assignment for new nodes (geometric distribution)
  • Insertion: greedy descent + beam search + neighbor pruning
  • Query: greedy descent + beam search at layer 0, return top-k

§What it is NOT responsible for (yet)

  • Storing vectors. The algorithm calls a get_vec(node_id) -> &[f32] closure to fetch the vector for any node it touches. In Phase 7d.2 that closure will read from the SQL table holding the indexed column; in tests it reads from an in-memory Vec<Vec<f32>>.
  • Persistence. The graph lives in HashMap<i64, Node> for now. Phase 7d.3 wires it into the cell-encoded page format.
  • DELETE / UPDATE. Pre-existing nodes can’t be removed today. Soft-delete + lazy rebuild is the planned approach for 7d.2/7d.3.

§Parameters (per Phase 7 plan Q2 — fixed defaults)

  • M = 16 — max neighbors per node at layers > 0
  • m_max0 = 32 (= 2·M) — max neighbors at layer 0
  • ef_construction = 200 — beam width during INSERT
  • ef_search = 50 — default beam width during query
  • m_l = 1/ln(M) ≈ 0.36 — layer-assignment scale

§Invariants

  • Every node.layers Vec has length node_max_layer + 1 for that node.
  • node.layers[i] contains node_ids of neighbors at layer i. Each neighbor is itself a node in nodes; symmetrical (if A → B at layer i then B → A at layer i, modulo pruning).
  • entry_point is Some(id) iff nodes is non-empty. The entry node has the highest max-layer of any node currently in the graph.

Structs§

HnswIndex
In-memory HNSW graph. See module docs for the model.
HnswParams
HNSW algorithm parameters. Phase 7 ships fixed defaults (Q2 in the plan); this struct is Clone + Copy so callers wanting to fork an experimental tuning can do so without touching the index itself.
Node
Per-node metadata: a list of neighbor IDs for each layer this node lives in. layers[0] is layer 0 (densest); layers[layers.len() - 1] is the highest layer this node reaches.

Enums§

DistanceMetric
Distance metric used by the HNSW index. Must match what the surrounding vec_distance_* SQL function would compute on the same pair of vectors — otherwise the index probe and the brute-force fallback would disagree on which rows are “nearest”. See src/sql/executor.rs’s vec_distance_l2 / _cosine / _dot for the canonical implementations.