Expand description
In-memory search index for real-time re-ranking.
Stores all chunk embeddings as a contiguous ndarray matrix so that
re-ranking is a single BLAS matrix-vector multiply via crate::similarity::rank_all.
Optionally uses TurboQuant compression for fast approximate
scanning at monorepo scale (100K+ chunks). TurboQuant compresses 768-dim
embeddings from 3072 bytes (FP32) to ~386 bytes (4-bit), giving ~5× faster
scan via sequential memory access + centroid table lookup.
Structs§
- Search
Index - Pre-computed embedding matrix for fast re-ranking.