Expand description
ALP-RD (Real Doubles) codec for true arbitrary f64 values.
For full-precision doubles (scientific data, vector embeddings) where ALP can’t find a lossless decimal mapping, ALP-RD exploits the structure of IEEE 754: the front bits (sign + exponent + high mantissa) are predictable (few unique patterns), while the tail bits (low mantissa) are noisy.
Approach:
- Right-shift each f64’s bits by
cutpositions (typically 44-52 bits), producing a “front” value with few unique values. - Dictionary-encode the front values (usually <256 unique patterns).
- Store the tail bits raw (the bottom
cutbits of each value).
Compression: ~54 bits/f64 vs 64 raw (~15% reduction). Modest but consistent and lossless.
Wire format:
[4 bytes] value count (LE u32)
[1 byte] cut position (number of tail bits, 0-63)
[2 bytes] dictionary size (LE u16)
[dict_size × 8 bytes] dictionary entries (LE u64 front values)
[count × 1-2 bytes] dictionary indices (u8 if dict ≤ 256, u16 otherwise)
[count × ceil(cut/8) bytes] tail bits (packed, little-endian)