pub enum VectorStorage {
FullPrecision {
vectors: Vec<f32>,
norms: Vec<f32>,
count: usize,
dimensions: usize,
},
BinaryQuantized {
quantized: Vec<Vec<u8>>,
original: Option<Vec<Vec<f32>>>,
thresholds: Vec<f32>,
dimensions: usize,
},
RaBitQQuantized {
quantizer: Option<RaBitQ>,
params: RaBitQParams,
quantized_data: Vec<u8>,
quantized_scales: Vec<f32>,
code_size: usize,
original: Vec<f32>,
original_count: usize,
dimensions: usize,
},
ScalarQuantized {
params: ScalarParams,
quantized: Vec<u8>,
norms: Vec<f32>,
sums: Vec<i32>,
training_buffer: Vec<f32>,
count: usize,
dimensions: usize,
trained: bool,
},
}Expand description
Vector storage (quantized or full precision)
Variants§
FullPrecision
Full precision f32 vectors - FLAT CONTIGUOUS STORAGE
Memory: dimensions * 4 bytes per vector + 4 bytes for norm Example: 1536D = 6148 bytes per vector
Vectors stored in single contiguous array for cache efficiency. Access: vectors[id * dimensions..(id + 1) * dimensions]
Norms (||v||²) are stored separately for L2 decomposition optimization: ||a-b||² = ||a||² + ||b||² - 2⟨a,b⟩ This reduces L2 distance from 3N FLOPs to 2N+3 FLOPs (~7% faster).
Fields
BinaryQuantized
Binary quantized vectors
Memory: dimensions / 8 bytes per vector (1 bit per dimension) Example: 1536D = 192 bytes per vector (32x compression)
Fields
RaBitQQuantized
RaBitQ quantized vectors for asymmetric search (CLOUD MOAT)
Memory: dimensions * bits / 8 bytes per vector (4-bit = 8x compression) Example: 1536D @ 4-bit = 768 bytes per vector
Key optimization: During search, query stays full precision while candidates use quantized representation. This gives 2-3x throughput by avoiding decompression while maintaining accuracy.
Reranking with original vectors restores recall to near full-precision.
Fields
params: RaBitQParamsRaBitQ parameters (for serialization)
quantized_data: Vec<u8>Quantized codes - flat contiguous array for cache efficiency Access: quantized_data[id * code_size..(id + 1) * code_size]
quantized_scales: Vec<f32>Per-vector rescaling factors - contiguous for cache efficiency Access: quantized_scales[id]
code_size: usizeBytes per quantized vector (computed from dimensions and bits) For 4-bit: code_size = dimensions / 2
ScalarQuantized
Scalar quantized vectors (SQ8) - 4x compression, ~97% recall, 2-3x faster
Memory: 1x (quantized only, no originals stored) Trade-off: 4x RAM savings for ~3% recall loss
Uses uniform min/max scaling with integer SIMD distance computation. Lazy training: Buffers first 256 vectors, then trains and quantizes.
Note: No rescore support - originals not stored to save memory.
Use RaBitQ if you need rescore with originals on disk.
Fields
params: ScalarParamsTrained quantization parameters (global scale/offset)
quantized: Vec<u8>Quantized vectors as flat contiguous u8 array Empty until training completes (after 256 vectors) Access: quantized[id * dimensions..(id + 1) * dimensions]
norms: Vec<f32>Pre-computed squared norms of dequantized vectors for L2 decomposition ||dequant(q)||² = Σ(code[d] * scale + offset)² Enables fast distance: ||a-b||² = ||a||² + ||b||² - 2⟨a,b⟩
sums: Vec<i32>Pre-computed sums of quantized values for fast integer dot product sum = Σ quantized[d]
Implementations§
Source§impl VectorStorage
impl VectorStorage
Sourcepub fn new_full_precision(dimensions: usize) -> Self
pub fn new_full_precision(dimensions: usize) -> Self
Create empty full precision storage
Sourcepub fn new_binary_quantized(dimensions: usize, keep_original: bool) -> Self
pub fn new_binary_quantized(dimensions: usize, keep_original: bool) -> Self
Create empty binary quantized storage
Sourcepub fn new_rabitq_quantized(dimensions: usize, params: RaBitQParams) -> Self
pub fn new_rabitq_quantized(dimensions: usize, params: RaBitQParams) -> Self
Create empty RaBitQ quantized storage for asymmetric search (CLOUD MOAT)
§Arguments
dimensions- Vector dimensionalityparams-RaBitQquantization parameters (typically 4-bit for 8x compression)
§Performance
- Search: 2-3x faster than full precision (asymmetric distance)
- Memory: 8x smaller storage (4-bit quantization)
- Recall: 98%+ with reranking
Sourcepub fn new_sq8_quantized(dimensions: usize) -> Self
pub fn new_sq8_quantized(dimensions: usize) -> Self
Create empty SQ8 (Scalar Quantized) storage
§Arguments
dimensions- Vector dimensionality
§Performance
- Search: 2-3x faster than f32 (integer SIMD)
- Memory: 4x smaller (quantized only, no originals)
- Recall: ~97% (no rescore support)
§Lazy Training
Quantization parameters are trained automatically after 256 vectors. Before training completes, search falls back to f32 distance on the training buffer.
Sourcepub fn is_asymmetric(&self) -> bool
pub fn is_asymmetric(&self) -> bool
Check if this storage uses asymmetric search (RaBitQ and SQ8)
Both RaBitQ and SQ8 use direct asymmetric L2 distance for search. This gives ~99.9% recall on SIFT-50K.
The mono path with L2 decomposition has 10% recall regression due to floating point ordering differences during HNSW graph traversal. Even increasing ef doesn’t recover the missing candidates.
Sourcepub fn is_binary_quantized(&self) -> bool
pub fn is_binary_quantized(&self) -> bool
Check if this storage uses binary quantization
Sourcepub fn dimensions(&self) -> usize
pub fn dimensions(&self) -> usize
Get dimensions
Sourcepub fn insert(&mut self, vector: Vec<f32>) -> Result<u32, String>
pub fn insert(&mut self, vector: Vec<f32>) -> Result<u32, String>
Insert a full precision vector
Sourcepub fn get(&self, id: u32) -> Option<&[f32]>
pub fn get(&self, id: u32) -> Option<&[f32]>
Get a vector by ID (full precision)
Returns slice directly into contiguous storage - zero-copy, cache-friendly.
For RaBitQQuantized, returns the original vector (used for reranking).
Sourcepub fn get_dequantized(&self, id: u32) -> Option<Vec<f32>>
pub fn get_dequantized(&self, id: u32) -> Option<Vec<f32>>
Get a vector by ID, dequantizing if necessary (returns owned Vec)
For full precision storage, clones the slice. For quantized storage (SQ8), dequantizes the quantized bytes to f32. Used for neighbor-to-neighbor distance calculations during graph construction.
Sourcepub fn distance_asymmetric_l2(&self, query: &[f32], id: u32) -> Option<f32>
pub fn distance_asymmetric_l2(&self, query: &[f32], id: u32) -> Option<f32>
Compute asymmetric L2 distance (query full precision, candidate quantized)
This is the HOT PATH for asymmetric search. Works with RaBitQQuantized and
ScalarQuantized storage. Returns None if storage is not quantized, not trained,
or if id is out of bounds.
§Performance (Apple Silicon M3 Max, 768D)
- SQ8: Similar speed to full precision (1.07x)
- RaBitQ: ~0.5x speed (ADC + interleaving overhead)
Sourcepub fn get_norm(&self, id: u32) -> Option<f32>
pub fn get_norm(&self, id: u32) -> Option<f32>
Get the pre-computed squared norm (||v||²) for a vector
Only available for FullPrecision storage. Used for L2 decomposition optimization.
Sourcepub fn supports_l2_decomposition(&self) -> bool
pub fn supports_l2_decomposition(&self) -> bool
Check if L2 decomposition is available for this storage
Returns true for:
- FullPrecision storage (always has pre-computed norms)
- ScalarQuantized storage when trained (uses multiversion dot_product)
The decomposition path uses dot_product with #[multiversion] which
provides better cross-compilation compatibility than raw NEON intrinsics.
Sourcepub fn distance_l2_decomposed(
&self,
query: &[f32],
query_norm: f32,
id: u32,
) -> Option<f32>
pub fn distance_l2_decomposed( &self, query: &[f32], query_norm: f32, id: u32, ) -> Option<f32>
Compute L2 squared distance using decomposition: ||a-b||² = ||a||² + ||b||² - 2⟨a,b⟩
This is ~7-15% faster than direct L2/asymmetric computation because:
- Vector norms are pre-computed during insert
- Query norm is computed once per search (passed in)
- Only dot product is computed per-vector (2N FLOPs vs 3N)
Works for both FullPrecision and trained ScalarQuantized storage. Returns None if decomposition is not available.
Sourcepub fn get_quantized(&self, id: u32) -> Option<QuantizedVector>
pub fn get_quantized(&self, id: u32) -> Option<QuantizedVector>
Get the quantized vector for a given ID (reconstructed from flat storage)
Note: Returns an owned QuantizedVector reconstructed from flat storage.
Prefer using distance_adc or distance_asymmetric_l2 for distance computation.
Sourcepub fn quantizer(&self) -> Option<&RaBitQ>
pub fn quantizer(&self) -> Option<&RaBitQ>
Get the RaBitQ quantizer (for external asymmetric distance computation)
Sourcepub fn build_adc_table(&self, query: &[f32]) -> Option<UnifiedADC>
pub fn build_adc_table(&self, query: &[f32]) -> Option<UnifiedADC>
Build ADC lookup table for a query
Only used for RaBitQ (4-bit). SQ8 uses asymmetric SIMD instead. SQ8 ADC is slower on Apple Silicon because:
- ADC has scattered memory access (d×256+code stride)
- Asymmetric SIMD is pure compute (dequantize + L2)
- Apple Silicon’s high SIMD throughput makes compute faster
SQ8 does NOT use ADC tables because:
- 768D ADC table = 768KB (doesn’t fit L1 cache)
- Scattered memory access pattern causes cache misses
- Direct asymmetric SIMD is 10x faster on Apple Silicon
Returns None for full-precision, SQ8, or not yet trained.
Sourcepub fn distance_adc(&self, adc: &UnifiedADC, id: u32) -> Option<f32>
pub fn distance_adc(&self, adc: &UnifiedADC, id: u32) -> Option<f32>
Compute distance using precomputed ADC table
Note: SQ8 uses integer SIMD distance via distance_asymmetric_l2 instead of ADC.
Sourcepub fn prefetch(&self, id: u32)
pub fn prefetch(&self, id: u32)
Prefetch a vector’s data into CPU cache (for HNSW search optimization)
This hints to the CPU to load the vector data into cache before it’s needed. Call this on neighbor[j+1] while computing distance to neighbor[j]. ~10% search speedup per hnswlib benchmarks.
NOTE: This gets the pointer directly without loading the data, so the prefetch hint can be issued before the data is needed. Prefetch vector data into L1 cache
Simple single-cache-line prefetch (64 bytes). Hardware prefetcher handles subsequent cache lines.
Sourcepub fn prefetch_quantized(&self, id: u32)
pub fn prefetch_quantized(&self, id: u32)
Prefetch quantized vector data for asymmetric search
More efficient than prefetch() for RaBitQ mode as it only fetches
the quantized representation, not the full precision original.
Sourcepub fn rabitq_code_size(&self) -> Option<usize>
pub fn rabitq_code_size(&self) -> Option<usize>
Get RaBitQ code_size (bytes per quantized vector)
Returns None if not using RaBitQ quantization.
Sourcepub fn get_rabitq_code(&self, id: u32) -> Option<&[u8]>
pub fn get_rabitq_code(&self, id: u32) -> Option<&[u8]>
Get quantized code for a vector (RaBitQ only)
Returns a slice of the quantized code bytes for the given vector ID. Returns None if vector doesn’t exist or not using RaBitQ.
Sourcepub fn build_interleaved_codes(
&self,
neighbors: &[u32],
output: &mut [u8],
) -> usize
pub fn build_interleaved_codes( &self, neighbors: &[u32], output: &mut [u8], ) -> usize
Build interleaved codes for FastScan from a batch of neighbor IDs
For 32 neighbors with code_size bytes each, produces:
- 32 bytes for sub-quantizer 0 (one byte from each neighbor)
- 32 bytes for sub-quantizer 1
- … etc
Total output size: code_size * 32 bytes
§Arguments
neighbors- Up to 32 neighbor IDs to interleaveoutput- Pre-allocated buffer of size code_size * 32
Returns number of valid neighbors (rest are zero-padded)
Sourcepub fn train_quantization(
&mut self,
sample_vectors: &[Vec<f32>],
) -> Result<(), String>
pub fn train_quantization( &mut self, sample_vectors: &[Vec<f32>], ) -> Result<(), String>
Compute quantization thresholds from sample vectors
Uses median of each dimension as threshold
Sourcepub fn memory_usage(&self) -> usize
pub fn memory_usage(&self) -> usize
Get memory usage in bytes (approximate)
Trait Implementations§
Source§impl Clone for VectorStorage
impl Clone for VectorStorage
Source§fn clone(&self) -> VectorStorage
fn clone(&self) -> VectorStorage
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for VectorStorage
impl Debug for VectorStorage
Source§impl<'de> Deserialize<'de> for VectorStorage
impl<'de> Deserialize<'de> for VectorStorage
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Auto Trait Implementations§
impl Freeze for VectorStorage
impl RefUnwindSafe for VectorStorage
impl Send for VectorStorage
impl Sync for VectorStorage
impl Unpin for VectorStorage
impl UnwindSafe for VectorStorage
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Downcast for Twhere
T: Any,
impl<T> Downcast for Twhere
T: Any,
Source§fn into_any(self: Box<T>) -> Box<dyn Any>
fn into_any(self: Box<T>) -> Box<dyn Any>
Box<dyn Trait> (where Trait: Downcast) to Box<dyn Any>, which can then be
downcast into Box<dyn ConcreteType> where ConcreteType implements Trait.Source§fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
Rc<Trait> (where Trait: Downcast) to Rc<Any>, which can then be further
downcast into Rc<ConcreteType> where ConcreteType implements Trait.Source§fn as_any(&self) -> &(dyn Any + 'static)
fn as_any(&self) -> &(dyn Any + 'static)
&Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot
generate &Any’s vtable from &Trait’s.Source§fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
&mut Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot
generate &mut Any’s vtable from &mut Trait’s.Source§impl<T> DowncastSend for T
impl<T> DowncastSend for T
Source§impl<T> DowncastSync for T
impl<T> DowncastSync for T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more