Expand description
Optimized Outlier Encoding
This module provides efficient outlier representation that avoids
.contains() calls in hot loops by using:
- Bitvector: O(1) membership for dense outliers (k/D > threshold)
- Sorted list + binary search: O(log k) for sparse outliers
§Problem
Original approach:
for dim in 0..D {
if outliers.contains(dim) { // O(k) linear scan!
use_outlier_value(dim);
} else {
use_quantized_value(dim);
}
}This is poison in tight loops: O(D × k) per vector decode.
§Solution
Hybrid representation with automatic crossover:
if k/D > BITVEC_THRESHOLD {
// Dense: use bitvector, O(1) per check
bitvec.test(dim)
} else {
// Sparse: sorted list + binary search, O(log k) per check
sorted_dims.binary_search(&dim).is_ok()
}§Crossover Analysis
For dimension D and k outliers:
- Bitvector: D/8 bytes storage, O(1) access
- Sorted list: k × 2 bytes storage (u16), O(log k) access
Crossover when: k × 2 ≈ D/8, i.e., k ≈ D/16 For D=768, crossover at k ≈ 48 outliers.
Structs§
- Batch
Outlier Lookup - Batch outlier lookup for SIMD-friendly decode.
- Dense
Iterator - Iterator over dense bitvector.
- Dense
Outliers - Dense outlier representation using bitvector.
- Outlier
Storage - Complete outlier storage for a vector.
- Outlier
Value - Outlier entry with dimension and value.
- Sparse
Outliers - Sparse outlier representation using sorted list.
Enums§
- Outlier
Iterator - Iterator over outlier dimensions.
- Outlier
Set - Optimized outlier set with hybrid representation.
Constants§
- BITVEC_
THRESHOLD - Threshold for switching from sorted list to bitvector. When k/D > this ratio, use bitvector.
- MAX_
DIMENSION - Maximum dimension supported (u16 indices).