Crate streamvbyte64
source ·Expand description
Fast Byte-Aligned Integer Coding
This crate is a Rust port of Daniel Lemire’s streamvbyte
library.
It contains multiple implementations of this format aimed at different integer sizes and value distributions.
Each Coder
implementation produces different formats that are incompatible with one another. Names
provide the length of each of the 4 possible tags for each value so Coder1234
encodes each entry
as 1, 2, 3, or 4 bytes. A scalar implementation is always available at a large speed penalty but
the implementation will automatically use an accelerated implementation for the target if available.
At the moment group implementations only have acceleration on little-endian aarch64
targets with
NEON
instruction support.
Example without delta-coding
use streamvbyte64::{Coder, Coder1234};
let coder = Coder1234::new();
let values = vec![
0u32, 128, 256, 1024, 70, 36, 1000000,
378, 45, 888, 26, 262144, 88, 89, 90, 16777216
];
let (tag_len, data_len) = Coder1234::max_compressed_bytes(values.len());
let mut encoded = vec![0u8; tag_len + data_len];
let (tags, data) = encoded.split_at_mut(tag_len);
// This is the number of bytes in data actually used. If you're writing the encoded data
// to an output you would write tags and &data[..encoded_data_len].
let encoded_data_len = coder.encode(&values, tags, data);
let mut decoded = vec![0u32; values.len()];
coder.decode(tags, &data[..encoded_data_len], &mut decoded);
assert_eq!(values, decoded);
// You can also use this a bit like an array using data_len() to skip whole groups
// at the cost of reading all the tags before the target group.
let mut group = [0u32; 4];
coder.decode(&tags[2..3], &data[coder.data_len(&tags[..2])..], &mut group);
assert_eq!(values[11], group[3]);
Example with delta coding
use streamvbyte64::{Coder, Coder1234};
let coder = Coder1234::new();
let mut sum = 0u32;
let values = [
0u32, 128, 256, 1024, 70, 36, 1000000,
378, 45, 888, 26, 262144, 88, 89, 90, 16777216
].iter().map(|x| {
sum += x;
sum
}).collect::<Vec<_>>();
let (tag_len, data_len) = Coder1234::max_compressed_bytes(values.len());
let mut encoded = vec![0u8; tag_len + data_len];
let (tags, data) = encoded.split_at_mut(tag_len);
let nodelta_data_len = coder.encode(&values, tags, data);
// encode_deltas() and decode_deltas() both accept an initial value that is subtracted from
// or added to all the value in the stream. At the beginning of the stream this is usually
// zero but may be non-zero if you are encoding/decoding in the middle of a stream.
let encoded_data_len = coder.encode_deltas(0, &values, tags, data);
// Fewer bytes are written with the delta encoding as we only record the distance between
// each value and not the value itself.
assert!(encoded_data_len < nodelta_data_len);
let mut decoded = vec![0u32; values.len()];
coder.decode_deltas(0, tags, &data[..encoded_data_len], &mut decoded);
assert_eq!(values, decoded);
// You can also use this a bit like an array using skip_deltas() to skip whole groups
// at a cost that is less than straight decoding.
let (skip_data_len, initial) = coder.skip_deltas(&tags[..2], data);
let mut group = [0u32; 4];
coder.decode_deltas(initial, &tags[2..3], &data[skip_data_len..], &mut group);
assert_eq!(values[11], group[3]);
Structs
Coder0124
packs 32-bit integers into lengths of 0, 1, 2, or 4 bytes.Coder1234
packs 32-bit integers into lengths of 1, 2, 3, or 4 bytes.Coder1248
packs 64-bit integers into lengths of 1, 2, 4, or 8 bytes.
Traits
Coder
compresses and decompresses integers in a byte-aligned format compose of two streams.- Generic trait for primitive integers.
- Performs addition that wraps around on overflow.
- Performs subtraction that wraps around on overflow.