Expand description
§Byte Stream Split (BSS) Miniblock Format
Byte Stream Split is a data transformation technique that improves compression by reorganizing multi-byte values to group bytes from the same position together. This is particularly effective for data where some byte positions have low entropy.
§How It Works
BSS splits multi-byte values by byte position, creating separate streams for each byte position across all values. This transformation is most beneficial when certain byte positions have low entropy (e.g., high-order bytes that are mostly zeros, sign-extended bytes, or floating-point sign/exponent bytes that cluster around common values).
§Example
Input data (f32): [1.0, 2.0, 3.0, 4.0]
In little-endian bytes:
- 1.0 =
[00, 00, 80, 3F] - 2.0 =
[00, 00, 00, 40] - 3.0 =
[00, 00, 40, 40] - 4.0 =
[00, 00, 80, 40]
After BSS transformation:
- Byte stream 0:
[00, 00, 00, 00](all first bytes) - Byte stream 1:
[00, 00, 00, 00](all second bytes) - Byte stream 2:
[80, 00, 40, 80](all third bytes) - Byte stream 3:
[3F, 40, 40, 40](all fourth bytes)
Output: [00, 00, 00, 00, 00, 00, 00, 00, 80, 00, 40, 80, 3F, 40, 40, 40]
§Compression Benefits
BSS itself doesn’t compress data - it reorders it. The compression benefit comes when BSS is combined with general-purpose compression (e.g., LZ4):
- Timestamps: Sequential timestamps have similar high-order bytes
- Sensor data: Readings often vary in a small range, sharing exponent bits
- Financial data: Prices may cluster around certain values
§Supported Types
- 32-bit floating point (f32)
- 64-bit floating point (f64)
§Chunk Handling
- Maximum chunk size depends on data type:
- f32: 1024 values (4KB per chunk)
- f64: 512 values (4KB per chunk)
- All chunks share a single global buffer
- Non-last chunks always contain power-of-2 values
Structs§
- Byte
Stream Split Decompressor - Byte Stream Split decompressor
- Byte
Stream Split Encoder - Byte Stream Split encoder for floating point values
Functions§
- should_
use_ bss - Determine if BSS should be used based on mode and data characteristics