VectorStorage

Enum VectorStorage 

Source
pub enum VectorStorage {
    FullPrecision {
        vectors: Vec<f32>,
        norms: Vec<f32>,
        count: usize,
        dimensions: usize,
    },
    BinaryQuantized {
        quantized: Vec<Vec<u8>>,
        original: Option<Vec<Vec<f32>>>,
        thresholds: Vec<f32>,
        dimensions: usize,
    },
    RaBitQQuantized {
        quantizer: Option<RaBitQ>,
        params: RaBitQParams,
        quantized_data: Vec<u8>,
        quantized_scales: Vec<f32>,
        code_size: usize,
        original: Vec<f32>,
        original_count: usize,
        dimensions: usize,
    },
    ScalarQuantized {
        params: ScalarParams,
        quantized: Vec<u8>,
        norms: Vec<f32>,
        sums: Vec<i32>,
        training_buffer: Vec<f32>,
        count: usize,
        dimensions: usize,
        trained: bool,
    },
}
Expand description

Vector storage (quantized or full precision)

Variants§

§

FullPrecision

Full precision f32 vectors - FLAT CONTIGUOUS STORAGE

Memory: dimensions * 4 bytes per vector + 4 bytes for norm Example: 1536D = 6148 bytes per vector

Vectors stored in single contiguous array for cache efficiency. Access: vectors[id * dimensions..(id + 1) * dimensions]

Norms (||v||²) are stored separately for L2 decomposition optimization: ||a-b||² = ||a||² + ||b||² - 2⟨a,b⟩ This reduces L2 distance from 3N FLOPs to 2N+3 FLOPs (~7% faster).

Fields

§vectors: Vec<f32>

Flat contiguous vector data (all vectors concatenated)

§norms: Vec<f32>

Pre-computed squared norms (||v||²) for L2 decomposition

§count: usize

Number of vectors stored

§dimensions: usize

Dimensions per vector

§

BinaryQuantized

Binary quantized vectors

Memory: dimensions / 8 bytes per vector (1 bit per dimension) Example: 1536D = 192 bytes per vector (32x compression)

Fields

§quantized: Vec<Vec<u8>>

Quantized vectors (1 bit per dimension, packed into bytes)

§original: Option<Vec<Vec<f32>>>

Original vectors for reranking (optional)

If present: Memory = quantized + original If absent: Faster but lower recall

§thresholds: Vec<f32>

Quantization thresholds (one per dimension)

§dimensions: usize

Vector dimensions

§

RaBitQQuantized

RaBitQ quantized vectors for asymmetric search (CLOUD MOAT)

Memory: dimensions * bits / 8 bytes per vector (4-bit = 8x compression) Example: 1536D @ 4-bit = 768 bytes per vector

Key optimization: During search, query stays full precision while candidates use quantized representation. This gives 2-3x throughput by avoiding decompression while maintaining accuracy.

Reranking with original vectors restores recall to near full-precision.

Fields

§quantizer: Option<RaBitQ>

RaBitQ quantizer (contains params)

§params: RaBitQParams

RaBitQ parameters (for serialization)

§quantized_data: Vec<u8>

Quantized codes - flat contiguous array for cache efficiency Access: quantized_data[id * code_size..(id + 1) * code_size]

§quantized_scales: Vec<f32>

Per-vector rescaling factors - contiguous for cache efficiency Access: quantized_scales[id]

§code_size: usize

Bytes per quantized vector (computed from dimensions and bits) For 4-bit: code_size = dimensions / 2

§original: Vec<f32>

Original vectors for reranking (required for final accuracy) Stored as flat contiguous array for cache efficiency.

§original_count: usize

Number of original vectors stored

§dimensions: usize

Vector dimensions

§

ScalarQuantized

Scalar quantized vectors (SQ8) - 4x compression, ~97% recall, 2-3x faster

Memory: 1x (quantized only, no originals stored) Trade-off: 4x RAM savings for ~3% recall loss

Uses uniform min/max scaling with integer SIMD distance computation. Lazy training: Buffers first 256 vectors, then trains and quantizes.

Note: No rescore support - originals not stored to save memory. Use RaBitQ if you need rescore with originals on disk.

Fields

§params: ScalarParams

Trained quantization parameters (global scale/offset)

§quantized: Vec<u8>

Quantized vectors as flat contiguous u8 array Empty until training completes (after 256 vectors) Access: quantized[id * dimensions..(id + 1) * dimensions]

§norms: Vec<f32>

Pre-computed squared norms of dequantized vectors for L2 decomposition ||dequant(q)||² = Σ(code[d] * scale + offset)² Enables fast distance: ||a-b||² = ||a||² + ||b||² - 2⟨a,b⟩

§sums: Vec<i32>

Pre-computed sums of quantized values for fast integer dot product sum = Σ quantized[d]

§training_buffer: Vec<f32>

Buffer for training vectors (cleared after training) During training phase, stores f32 vectors until we have enough to train

§count: usize

Number of vectors stored

§dimensions: usize

Vector dimensions

§trained: bool

Whether quantization parameters have been trained Training happens automatically after 256 vectors are inserted

Implementations§

Source§

impl VectorStorage

Source

pub fn new_full_precision(dimensions: usize) -> Self

Create empty full precision storage

Source

pub fn new_binary_quantized(dimensions: usize, keep_original: bool) -> Self

Create empty binary quantized storage

Source

pub fn new_rabitq_quantized(dimensions: usize, params: RaBitQParams) -> Self

Create empty RaBitQ quantized storage for asymmetric search (CLOUD MOAT)

§Arguments
  • dimensions - Vector dimensionality
  • params - RaBitQ quantization parameters (typically 4-bit for 8x compression)
§Performance
  • Search: 2-3x faster than full precision (asymmetric distance)
  • Memory: 8x smaller storage (4-bit quantization)
  • Recall: 98%+ with reranking
Source

pub fn new_sq8_quantized(dimensions: usize) -> Self

Create empty SQ8 (Scalar Quantized) storage

§Arguments
  • dimensions - Vector dimensionality
§Performance
  • Search: 2-3x faster than f32 (integer SIMD)
  • Memory: 4x smaller (quantized only, no originals)
  • Recall: ~97% (no rescore support)
§Lazy Training

Quantization parameters are trained automatically after 256 vectors. Before training completes, search falls back to f32 distance on the training buffer.

Source

pub fn is_asymmetric(&self) -> bool

Check if this storage uses asymmetric search (RaBitQ and SQ8)

Both RaBitQ and SQ8 use direct asymmetric L2 distance for search. This gives ~99.9% recall on SIFT-50K.

The mono path with L2 decomposition has 10% recall regression due to floating point ordering differences during HNSW graph traversal. Even increasing ef doesn’t recover the missing candidates.

Source

pub fn is_binary_quantized(&self) -> bool

Check if this storage uses binary quantization

Source

pub fn is_sq8(&self) -> bool

Check if this storage uses SQ8 quantization

Source

pub fn len(&self) -> usize

Get number of vectors stored

Source

pub fn is_empty(&self) -> bool

Check if empty

Source

pub fn dimensions(&self) -> usize

Get dimensions

Source

pub fn insert(&mut self, vector: Vec<f32>) -> Result<u32, String>

Insert a full precision vector

Source

pub fn get(&self, id: u32) -> Option<&[f32]>

Get a vector by ID (full precision)

Returns slice directly into contiguous storage - zero-copy, cache-friendly. For RaBitQQuantized, returns the original vector (used for reranking).

Source

pub fn get_dequantized(&self, id: u32) -> Option<Vec<f32>>

Get a vector by ID, dequantizing if necessary (returns owned Vec)

For full precision storage, clones the slice. For quantized storage (SQ8), dequantizes the quantized bytes to f32. Used for neighbor-to-neighbor distance calculations during graph construction.

Source

pub fn distance_asymmetric_l2(&self, query: &[f32], id: u32) -> Option<f32>

Compute asymmetric L2 distance (query full precision, candidate quantized)

This is the HOT PATH for asymmetric search. Works with RaBitQQuantized and ScalarQuantized storage. Returns None if storage is not quantized, not trained, or if id is out of bounds.

§Performance (Apple Silicon M3 Max, 768D)
  • SQ8: Similar speed to full precision (1.07x)
  • RaBitQ: ~0.5x speed (ADC + interleaving overhead)
Source

pub fn get_norm(&self, id: u32) -> Option<f32>

Get the pre-computed squared norm (||v||²) for a vector

Only available for FullPrecision storage. Used for L2 decomposition optimization.

Source

pub fn supports_l2_decomposition(&self) -> bool

Check if L2 decomposition is available for this storage

Returns true for:

  • FullPrecision storage (always has pre-computed norms)
  • ScalarQuantized storage when trained (uses multiversion dot_product)

The decomposition path uses dot_product with #[multiversion] which provides better cross-compilation compatibility than raw NEON intrinsics.

Source

pub fn distance_l2_decomposed( &self, query: &[f32], query_norm: f32, id: u32, ) -> Option<f32>

Compute L2 squared distance using decomposition: ||a-b||² = ||a||² + ||b||² - 2⟨a,b⟩

This is ~7-15% faster than direct L2/asymmetric computation because:

  • Vector norms are pre-computed during insert
  • Query norm is computed once per search (passed in)
  • Only dot product is computed per-vector (2N FLOPs vs 3N)

Works for both FullPrecision and trained ScalarQuantized storage. Returns None if decomposition is not available.

Source

pub fn get_quantized(&self, id: u32) -> Option<QuantizedVector>

Get the quantized vector for a given ID (reconstructed from flat storage)

Note: Returns an owned QuantizedVector reconstructed from flat storage. Prefer using distance_adc or distance_asymmetric_l2 for distance computation.

Source

pub fn quantizer(&self) -> Option<&RaBitQ>

Get the RaBitQ quantizer (for external asymmetric distance computation)

Source

pub fn build_adc_table(&self, query: &[f32]) -> Option<UnifiedADC>

Build ADC lookup table for a query

Only used for RaBitQ (4-bit). SQ8 uses asymmetric SIMD instead. SQ8 ADC is slower on Apple Silicon because:

  • ADC has scattered memory access (d×256+code stride)
  • Asymmetric SIMD is pure compute (dequantize + L2)
  • Apple Silicon’s high SIMD throughput makes compute faster

SQ8 does NOT use ADC tables because:

  • 768D ADC table = 768KB (doesn’t fit L1 cache)
  • Scattered memory access pattern causes cache misses
  • Direct asymmetric SIMD is 10x faster on Apple Silicon

Returns None for full-precision, SQ8, or not yet trained.

Source

pub fn distance_adc(&self, adc: &UnifiedADC, id: u32) -> Option<f32>

Compute distance using precomputed ADC table

Note: SQ8 uses integer SIMD distance via distance_asymmetric_l2 instead of ADC.

Source

pub fn prefetch(&self, id: u32)

Prefetch a vector’s data into CPU cache (for HNSW search optimization)

This hints to the CPU to load the vector data into cache before it’s needed. Call this on neighbor[j+1] while computing distance to neighbor[j]. ~10% search speedup per hnswlib benchmarks.

NOTE: This gets the pointer directly without loading the data, so the prefetch hint can be issued before the data is needed. Prefetch vector data into L1 cache

Simple single-cache-line prefetch (64 bytes). Hardware prefetcher handles subsequent cache lines.

Source

pub fn prefetch_quantized(&self, id: u32)

Prefetch quantized vector data for asymmetric search

More efficient than prefetch() for RaBitQ mode as it only fetches the quantized representation, not the full precision original.

Source

pub fn rabitq_code_size(&self) -> Option<usize>

Get RaBitQ code_size (bytes per quantized vector)

Returns None if not using RaBitQ quantization.

Source

pub fn get_rabitq_code(&self, id: u32) -> Option<&[u8]>

Get quantized code for a vector (RaBitQ only)

Returns a slice of the quantized code bytes for the given vector ID. Returns None if vector doesn’t exist or not using RaBitQ.

Source

pub fn build_interleaved_codes( &self, neighbors: &[u32], output: &mut [u8], ) -> usize

Build interleaved codes for FastScan from a batch of neighbor IDs

For 32 neighbors with code_size bytes each, produces:

  • 32 bytes for sub-quantizer 0 (one byte from each neighbor)
  • 32 bytes for sub-quantizer 1
  • … etc

Total output size: code_size * 32 bytes

§Arguments
  • neighbors - Up to 32 neighbor IDs to interleave
  • output - Pre-allocated buffer of size code_size * 32

Returns number of valid neighbors (rest are zero-padded)

Source

pub fn train_quantization( &mut self, sample_vectors: &[Vec<f32>], ) -> Result<(), String>

Compute quantization thresholds from sample vectors

Uses median of each dimension as threshold

Source

pub fn memory_usage(&self) -> usize

Get memory usage in bytes (approximate)

Source

pub fn reorder(&mut self, old_to_new: &[u32])

Reorder vectors based on node ID mapping

old_to_new[old_id] = new_id This reorders vectors to match the BFS-reordered neighbor lists.

Trait Implementations§

Source§

impl Clone for VectorStorage

Source§

fn clone(&self) -> VectorStorage

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for VectorStorage

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<'de> Deserialize<'de> for VectorStorage

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl Serialize for VectorStorage

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> Downcast for T
where T: Any,

Source§

fn into_any(self: Box<T>) -> Box<dyn Any>

Converts Box<dyn Trait> (where Trait: Downcast) to Box<dyn Any>, which can then be downcast into Box<dyn ConcreteType> where ConcreteType implements Trait.
Source§

fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>

Converts Rc<Trait> (where Trait: Downcast) to Rc<Any>, which can then be further downcast into Rc<ConcreteType> where ConcreteType implements Trait.
Source§

fn as_any(&self) -> &(dyn Any + 'static)

Converts &Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot generate &Any’s vtable from &Trait’s.
Source§

fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)

Converts &mut Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot generate &mut Any’s vtable from &mut Trait’s.
Source§

impl<T> DowncastSend for T
where T: Any + Send,

Source§

fn into_any_send(self: Box<T>) -> Box<dyn Any + Send>

Converts Box<Trait> (where Trait: DowncastSend) to Box<dyn Any + Send>, which can then be downcast into Box<ConcreteType> where ConcreteType implements Trait.
Source§

impl<T> DowncastSync for T
where T: Any + Send + Sync,

Source§

fn into_any_sync(self: Box<T>) -> Box<dyn Any + Sync + Send>

Converts Box<Trait> (where Trait: DowncastSync) to Box<dyn Any + Send + Sync>, which can then be downcast into Box<ConcreteType> where ConcreteType implements Trait.
Source§

fn into_any_arc(self: Arc<T>) -> Arc<dyn Any + Sync + Send>

Converts Arc<Trait> (where Trait: DowncastSync) to Arc<Any>, which can then be downcast into Arc<ConcreteType> where ConcreteType implements Trait.
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,

Source§

impl<T> Fruit for T
where T: Send + Downcast,