# sqlitegraph-core/src/hnsw/ — Hierarchical Navigable Small World Index
HNSW vector similarity search. This is the primary index for finding nearest neighbors
in high-dimensional vector space. Used for: semantic code search, sparse inference
neuron selection, embedding-based retrieval.
## Architecture
- `index.rs` — `HnswIndex<V: Vector>` — the core struct. Insert, search, delete, stats.
- `index_api.rs` — Public API methods on HnswIndex.
- `builder.rs` — `HnswConfigBuilder` — fluent config constructor.
- `config.rs` — `HnswConfig`, `hnsw_config()` — M (max connections), ef_construction, ef_search.
- `storage.rs` — Vector storage backends (in-memory, mmap, SQLite-backed).
- `layer.rs` — Multi-layer graph structure (bottom layer = all nodes, upper = sparse).
- `multilayer.rs` — `MultiLayerNodeManager`, `LevelDistributor`, `LayerMappings`.
- `distance_metric.rs` — `DistanceMetric` trait + `compute_distance()` dispatch.
- `distance_functions.rs` — Cosine, Euclidean, inner product, dot product implementations.
- `batch_filter.rs` — Batch filtering for filtered search (label/property predicates).
- `errors.rs` — `HnswError` variants.
## Key Parameters
- `M` — Max edges per node per layer. Higher = better recall, more memory. Default: 16.
- `ef_construction` — Beam width during index build. Higher = better index quality, slower build. Default: 200.
- `ef_search` — Beam width during query. Higher = better recall, slower query. Default: 50.
## Key Conventions
- Vectors are generic via `Vector` trait — not hardcoded to f32.
- Distance metric is configurable per-index. Cosine for embeddings, Euclidean for raw weights.
- Thread safety: HNSW operations acquire internal locks. Do NOT hold external locks while calling HNSW.
- Storage backends are pluggable: in-memory HashMap (default), mmap for large datasets, SQLite for persistence.
- `batch_filter.rs` enables pre-filtering search by graph properties before distance computation.
- Delete is soft-delete (marks node, compacts lazily). Hard delete via `compact()`.
## When Editing Here
- Distance functions MUST be SIMD-optimized (AVX2/AVX-512) for any vector dimension >= 64.
- New distance metrics: implement `DistanceMetric` trait, register in `distance_functions.rs`.
- Breaking the layer structure invariant (node in layer N must exist in layers 0..N) causes panics.
- `storage.rs` changes affect all vector persistence — test with mmap AND in-memory backends.
- Config changes must preserve backward-compatible defaults (existing indices keep working).