Skip to main content

Module outlier_encoding

Module outlier_encoding 

Source
Expand description

Optimized Outlier Encoding

This module provides efficient outlier representation that avoids .contains() calls in hot loops by using:

  • Bitvector: O(1) membership for dense outliers (k/D > threshold)
  • Sorted list + binary search: O(log k) for sparse outliers

§Problem

Original approach:

for dim in 0..D {
    if outliers.contains(dim) {  // O(k) linear scan!
        use_outlier_value(dim);
    } else {
        use_quantized_value(dim);
    }
}

This is poison in tight loops: O(D × k) per vector decode.

§Solution

Hybrid representation with automatic crossover:

if k/D > BITVEC_THRESHOLD {
    // Dense: use bitvector, O(1) per check
    bitvec.test(dim)
} else {
    // Sparse: sorted list + binary search, O(log k) per check
    sorted_dims.binary_search(&dim).is_ok()
}

§Crossover Analysis

For dimension D and k outliers:

  • Bitvector: D/8 bytes storage, O(1) access
  • Sorted list: k × 2 bytes storage (u16), O(log k) access

Crossover when: k × 2 ≈ D/8, i.e., k ≈ D/16 For D=768, crossover at k ≈ 48 outliers.

Structs§

BatchOutlierLookup
Batch outlier lookup for SIMD-friendly decode.
DenseIterator
Iterator over dense bitvector.
DenseOutliers
Dense outlier representation using bitvector.
OutlierStorage
Complete outlier storage for a vector.
OutlierValue
Outlier entry with dimension and value.
SparseOutliers
Sparse outlier representation using sorted list.

Enums§

OutlierIterator
Iterator over outlier dimensions.
OutlierSet
Optimized outlier set with hybrid representation.

Constants§

BITVEC_THRESHOLD
Threshold for switching from sorted list to bitvector. When k/D > this ratio, use bitvector.
MAX_DIMENSION
Maximum dimension supported (u16 indices).