Module byte_stream_split

Source
Expand description

§Byte Stream Split (BSS) Miniblock Format

Byte Stream Split is a data transformation technique optimized for floating-point data compression. It improves compression ratios by reorganizing data to group similar byte patterns together.

§How It Works

BSS splits floating-point values by byte position, creating separate streams for each byte position across all values. This transformation exploits the fact that floating-point data often has patterns in specific byte positions (e.g., similar exponents or mantissa patterns).

§Example

Input data (f32): [1.0, 2.0, 3.0, 4.0]

In little-endian bytes:

  • 1.0 = [00, 00, 80, 3F]
  • 2.0 = [00, 00, 00, 40]
  • 3.0 = [00, 00, 40, 40]
  • 4.0 = [00, 00, 80, 40]

After BSS transformation:

  • Byte stream 0: [00, 00, 00, 00] (all first bytes)
  • Byte stream 1: [00, 00, 00, 00] (all second bytes)
  • Byte stream 2: [80, 00, 40, 80] (all third bytes)
  • Byte stream 3: [3F, 40, 40, 40] (all fourth bytes)

Output: [00, 00, 00, 00, 00, 00, 00, 00, 80, 00, 40, 80, 3F, 40, 40, 40]

§Compression Benefits

BSS itself doesn’t compress data - it reorders it. The compression benefit comes when BSS is combined with general-purpose compression (e.g., LZ4):

  1. Timestamps: Sequential timestamps have similar high-order bytes
  2. Sensor data: Readings often vary in a small range, sharing exponent bits
  3. Financial data: Prices may cluster around certain values

§Supported Types

  • 32-bit floating point (f32)
  • 64-bit floating point (f64)

§Chunk Handling

  • Maximum chunk size depends on data type:
    • f32: 1024 values (4KB per chunk)
    • f64: 512 values (4KB per chunk)
  • All chunks share a single global buffer
  • Non-last chunks always contain power-of-2 values

Structs§

ByteStreamSplitDecompressor
Byte Stream Split decompressor
ByteStreamSplitEncoder
Byte Stream Split encoder for floating point values