Crate streamvbyte64

source ·
Expand description

Fast Byte-Aligned Integer Coding

This crate is a Rust port of Daniel Lemire’s streamvbyte library. It contains multiple implementations of this format aimed at different integer sizes and value distributions.

Each Coder implementation produces different formats that are incompatible with one another. Names provide the length of each of the 4 possible tags for each value so Coder1234 encodes each entry as 1, 2, 3, or 4 bytes. A scalar implementation is always available at a large speed penalty but the implementation will automatically use an accelerated implementation for the target if available.

At the moment group implementations only have acceleration on little-endian aarch64 targets with NEON instruction support.

Example without delta-coding

use streamvbyte64::{Coder, Coder1234};

let coder = Coder1234::new();
let values = vec![
    0u32, 128, 256, 1024, 70, 36, 1000000,
    378, 45, 888, 26, 262144, 88, 89, 90, 16777216
];
let (tag_len, data_len) = Coder1234::max_compressed_bytes(values.len());
let mut encoded = vec![0u8; tag_len + data_len];
let (tags, data) = encoded.split_at_mut(tag_len);
// This is the number of bytes in data actually used. If you're writing the encoded data
// to an output you would write tags and &data[..encoded_data_len].
let encoded_data_len = coder.encode(&values, tags, data);

let mut decoded = vec![0u32; values.len()];
coder.decode(tags, &data[..encoded_data_len], &mut decoded);
assert_eq!(values, decoded);

// You can also use this a bit like an array using data_len() to skip whole groups
// at the cost of reading all the tags before the target group.
let mut group = [0u32; 4];
coder.decode(&tags[2..3], &data[coder.data_len(&tags[..2])..], &mut group);
assert_eq!(values[11], group[3]);

Example with delta coding

use streamvbyte64::{Coder, Coder1234};

let coder = Coder1234::new();
let mut sum = 0u32;
let values = [
    0u32, 128, 256, 1024, 70, 36, 1000000,
    378, 45, 888, 26, 262144, 88, 89, 90, 16777216
].iter().map(|x| {
    sum += x;
    sum
}).collect::<Vec<_>>();
let (tag_len, data_len) = Coder1234::max_compressed_bytes(values.len());
let mut encoded = vec![0u8; tag_len + data_len];
let (tags, data) = encoded.split_at_mut(tag_len);
let nodelta_data_len = coder.encode(&values, tags, data);
// encode_deltas() and decode_deltas() both accept an initial value that is subtracted from
// or added to all the value in the stream. At the beginning of the stream this is usually
// zero but may be non-zero if you are encoding/decoding in the middle of a stream.
let encoded_data_len = coder.encode_deltas(0, &values, tags, data);
// Fewer bytes are written with the delta encoding as we only record the distance between
// each value and not the value itself.
assert!(encoded_data_len < nodelta_data_len);

let mut decoded = vec![0u32; values.len()];
coder.decode_deltas(0, tags, &data[..encoded_data_len], &mut decoded);
assert_eq!(values, decoded);

// You can also use this a bit like an array using skip_deltas() to skip whole groups
// at a cost that is less than straight decoding.
let (skip_data_len, initial) = coder.skip_deltas(&tags[..2], data);
let mut group = [0u32; 4];
coder.decode_deltas(initial, &tags[2..3], &data[skip_data_len..], &mut group);
assert_eq!(values[11], group[3]);

Structs

  • Coder0124 packs 32-bit integers into lengths of 0, 1, 2, or 4 bytes.
  • Coder1234 packs 32-bit integers into lengths of 1, 2, 3, or 4 bytes.
  • Coder1248 packs 64-bit integers into lengths of 1, 2, 4, or 8 bytes.

Traits

  • Coder compresses and decompresses integers in a byte-aligned format compose of two streams.
  • Generic trait for primitive integers.
  • Performs addition that wraps around on overflow.
  • Performs subtraction that wraps around on overflow.