Expand description
int8 Quantization for Vector Embeddings
Compresses fp32 vectors to int8 (8 bits per dimension) for efficient storage and fast approximate distance computation.
§Compression Ratio
- fp32: 4 bytes per dimension
- int8: 1 byte per dimension = 4x compression
Example: 1024-dim vector
- fp32: 4096 bytes
- int8: 1024 bytes
§Quantization Methods
§Symmetric Quantization
Maps [-max_abs, +max_abs] → [-127, +127]
- scale = max(|v|) / 127
- quantized = round(v / scale)
§Asymmetric Quantization
Maps [min, max] → [0, 255]
- scale = (max - min) / 255
- zero_point = round(-min / scale)
- quantized = round(v / scale) + zero_point
§Usage
ⓘ
// Quantize a vector (symmetric)
let int8 = Int8Vector::from_f32(&embedding);
// Compute dot product (SIMD accelerated)
let dot = int8.dot_product(&other);
// Rescore binary search candidates
let rescored = int8.rescore_candidates(&binary_results, &query);Structs§
- Int8
Index - Index of int8 vectors for batch operations
- Int8
Vector - int8 quantized vector with scale factor
Functions§
- dot_
product_ i8_ f32_ simd - Compute dot product of i8 vector with f32 query (asymmetric)
- dot_
product_ i8_ simd - Compute dot product of two i8 vectors using SIMD
- l2_
squared_ i8_ simd - Compute L2 squared distance between two i8 vectors