Skip to main content

Module matryoshka

Module matryoshka 

Source
Expand description

Matryoshka adaptive-dim querying.

For embedding models trained with Matryoshka Representation Learning (MRL), prefix-truncated vectors retain semantic structure. We exploit this for memory-bandwidth-bound HNSW: coarse pass at low dim → exact rerank at full dim.

§Cache-efficiency

A 256-dim coarse pass reads 256 × 4 = 1 KiB per vector vs 1536 × 4 = 6 KiB for full-dim traversal — a ≥6× reduction in L1/L2 memory bandwidth during the dominant HNSW graph traversal phase.

§Supported models

  • text-embedding-3 (OpenAI): [256, 512, 1024, 1536]
  • Gemini Embedding: [256, 512, 768, 3072]
  • Nomic Embed: [64, 128, 256, 512, 768]

Structs§

MatryoshkaSearchOptions
Options for a two-stage Matryoshka search.
MatryoshkaSpec
Per-collection Matryoshka configuration.

Functions§

matryoshka_search
Two-stage Matryoshka search.
truncate
Stride-truncate a vector to its first dim components.