Expand description
Matryoshka adaptive-dim querying.
For embedding models trained with Matryoshka Representation Learning (MRL), prefix-truncated vectors retain semantic structure. We exploit this for memory-bandwidth-bound HNSW: coarse pass at low dim → exact rerank at full dim.
§Cache-efficiency
A 256-dim coarse pass reads 256 × 4 = 1 KiB per vector vs 1536 × 4 = 6 KiB for full-dim traversal — a ≥6× reduction in L1/L2 memory bandwidth during the dominant HNSW graph traversal phase.
§Supported models
text-embedding-3(OpenAI):[256, 512, 1024, 1536]Gemini Embedding:[256, 512, 768, 3072]Nomic Embed:[64, 128, 256, 512, 768]
Structs§
- Matryoshka
Search Options - Options for a two-stage Matryoshka search.
- Matryoshka
Spec - Per-collection Matryoshka configuration.
Functions§
- matryoshka_
search - Two-stage Matryoshka search.
- truncate
- Stride-truncate a vector to its first
dimcomponents.