Skip to main content

Module alp_rd

Module alp_rd 

Source
Expand description

ALP-RD (Real Doubles) codec for true arbitrary f64 values.

For full-precision doubles (scientific data, vector embeddings) where ALP can’t find a lossless decimal mapping, ALP-RD exploits the structure of IEEE 754: the front bits (sign + exponent + high mantissa) are predictable (few unique patterns), while the tail bits (low mantissa) are noisy.

Approach:

  1. Right-shift each f64’s bits by cut positions (typically 44-52 bits), producing a “front” value with few unique values.
  2. Dictionary-encode the front values (usually <256 unique patterns).
  3. Store the tail bits raw (the bottom cut bits of each value).

Compression: ~54 bits/f64 vs 64 raw (~15% reduction). Modest but consistent and lossless.

Wire format:

[4 bytes] value count (LE u32)
[1 byte]  cut position (number of tail bits, 0-63)
[2 bytes] dictionary size (LE u16)
[dict_size × 8 bytes] dictionary entries (LE u64 front values)
[count × 1-2 bytes] dictionary indices (u8 if dict ≤ 256, u16 otherwise)
[count × ceil(cut/8) bytes] tail bits (packed, little-endian)

Functions§

decode
Decode ALP-RD compressed data back to f64 values.
encode
Encode f64 values using ALP-RD (front-bit dictionary + raw tail bits).