frankensearch-index
FSVI vector index, SIMD dot product, and top-k search for frankensearch.
Overview
This crate implements the FSVI (FrankenSearch Vector Index) binary format for storing and searching dense vector embeddings. It provides memory-mapped I/O for zero-copy access, brute-force exact top-k search with SIMD acceleration, and an optional HNSW approximate nearest-neighbor index for large-scale deployments.
The FSVI format is designed for cache-line and SIMD friendliness, with 64-byte aligned vector slabs supporting both f32 and f16 quantization.
FSVI File Layout
+-------------------------------------------+
| Header (variable length) |
| magic: b"FSVI" (4 bytes) |
| version: u16 (2 bytes) |
| embedder_id + revision (variable) |
| dimension: u32 (4 bytes) |
| quantization: u8 (1 byte) |
| record_count: u64 (8 bytes) |
| vectors_offset: u64 (8 bytes) |
| header_crc32: u32 (4 bytes) |
+-------------------------------------------+
| Record Table (16 bytes per record) |
+-------------------------------------------+
| String Table (UTF-8 doc IDs) |
+-------------------------------------------+
| Padding (to 64-byte alignment) |
+-------------------------------------------+
| Vector Slab (record_count x dim x elem) |
+-------------------------------------------+
Key Types
VectorIndex- memory-mapped FSVI index reader with brute-force searchVectorIndexWriter- builder for writing new FSVI filesTwoTierIndex/TwoTierIndexBuilder- manages fast + quality index pair for two-tier searchVectorMetadata- parsed header metadata (embedder ID, dimension, quantization, record count)Quantization- element type enum (F32 or F16)VacuumStats- statistics from tombstone compactionSearchParams- search configuration (top-k, parallel thresholds)ScalarQuantizer- f32-to-f16 quantization utilitiesHnswIndex/HnswConfig- HNSW approximate nearest-neighbor index (feature:ann)MrlConfig- Matryoshka Representation Learning configuration for progressive dimensionalityWalConfig/CompactionStats- write-ahead log for atomic index updatesWarmUpConfig/WarmUpStrategy- adaptive index warmup with heat map tracking
SIMD Functions
dot_product_f16_f32- SIMD dot product between f16 and f32 vectorsdot_product_f32_f32- SIMD dot product between two f32 vectorscosine_similarity_f16- cosine similarity for f16 vectors
Usage
use ;
use TwoTierConfig;
// Open a two-tier index (fast + quality)
let index = open
.expect;
// Low-level: open a single FSVI file
let vi = open
.expect;
println!;
Dependency Graph Position
frankensearch-core
^
|
frankensearch-index
^
|-- frankensearch-fusion
|-- frankensearch-fsfs
|-- frankensearch (root)
License
MIT