Module matryoshka

Expand description

Matryoshka adaptive-dim querying.

For embedding models trained with Matryoshka Representation Learning (MRL), prefix-truncated vectors retain semantic structure. We exploit this for memory-bandwidth-bound HNSW: coarse pass at low dim → exact rerank at full dim.

§Cache-efficiency

A 256-dim coarse pass reads 256 × 4 = 1 KiB per vector vs 1536 × 4 = 6 KiB for full-dim traversal — a ≥6× reduction in L1/L2 memory bandwidth during the dominant HNSW graph traversal phase.

§Supported models

text-embedding-3 (OpenAI): [256, 512, 1024, 1536]
Gemini Embedding: [256, 512, 768, 3072]
Nomic Embed: [64, 128, 256, 512, 768]

Structs§

MatryoshkaSearchOptions: Options for a two-stage Matryoshka search.
MatryoshkaSpec: Per-collection Matryoshka configuration.

Functions§

matryoshka_search: Two-stage Matryoshka search.
truncate: Stride-truncate a vector to its first dim components.

Module matryoshka

Module matryoshka Copy item path

§Cache-efficiency

§Supported models

Structs§

Functions§

Module matryoshka