Expand description
§MinMax Quantization
MinMax quantization provides memory-efficient vector compression by converting floating-point values to small n-bit integers on a per-vector basis.
§Core Concept
Each vector is independently quantized using the formula:
X' = round((X - s) * (2^n - 1) / c).clamp(0, 2^n - 1)where s is a shift value and c is a scaling parameter computed from the
range of values.
For most bit widths (>1), given a positive scaling parameter grid_scale : f32,
these are computed as:
- m = (max_i X[i] + min_i X[i]) / 2.0
- w = max_i X[i] - min_i X[i]
- s = m - w * grid_scale
- c = 2 * w * grid_scaleFor 1-bit quantization, to avoid outliers, s and c are derived differently:
i) Values are first split into two groups: those below and above the mean.
ii) s is the average of values below the mean.
iii) c is the difference between the average of values above the mean and s.
This encoding is similar to scalar quantization, but, since both ‘s’ and ‘c’ are computed on a per-vector basis, this allows this quantization mechanism to be applied in a streaming setting; making it qualitatively different than scalar quantization.
§Module Components
MinMaxQuantizer: Handles vector encoding and decodingData: Stores quantized vectors with compensation parameters- Distance functions:
MinMaxIP: Inner product distance for quantized vectors.MinMaxL2Squared: L2 (Euclidean) distance for quantized vectors.MinMaxCosine: Cosine similarity for quantized vectors.MinMaxCosineNormalized: Cosine similarity for quantized vectors assuming the original full-precision vectors were normalized.
To reconstruct the original vector, the inverse operation is applied:
X = X' * c / (2^n - 1) + sStructs§
- Full
Query Meta - A meta struct storing the
sumandnorm_squaredof a full query after transformation is applied to it. - MinMax
Compensation - A per-vector precomputed coefficients to help compute inner products and squared L2 distances for the MinMax quantized vectors.
- MinMax
Cosine - MinMax
Cosine Normalized - MinMaxIP
- MinMax
Kernel - Kernel for computing
MaxSimandChamferdistance using MinMax quantized vectors. - MinMax
L2Squared - MinMax
Meta - Metadata for MinMax quantized multi-vectors.
- MinMax
Quantizer - Recall that from the module-level documentation, MinMaxQuantizer, quantizes X
into
nbit vectors as follows - - Recompressor
- Recompression utilities for MinMax quantized vectors.
Enums§
- Decompress
Error - L2Loss
- A struct defining euclidean loss from quantization.
- Meta
Parse Error - Error type for parsing a slice of bytes as a
DataRefand returning corresponding dimension. - Recompress
Error - Error type for recompression operations.
Type Aliases§
- Data
- An owning compressed data vector
- Data
MutRef - A mutable borrowed
Datavector - DataRef
- A borrowed
Datavector - Full
Query - A full precision query.
- Full
Query Mut - A mutable borrowed full precision query.
- Full
Query Ref - A borrowed full precision query.