svb

Pure-Rust StreamVByte covering all major codec variants for u16, u32, and u64 integers. Delta and zigzag encoding are composable layers on top. SIMD back-ends are available for x86-64 (SSSE3, AVX2) and AArch64 (NEON).

Documentation | API reference

StreamVByte stores each integer in the minimum number of bytes its value requires — 1, 2, 3, or 4 bytes for a u32; 1 or 2 for a u16 — and keeps the per-integer width metadata (the control stream) separate from the integer bytes (the data stream). That two-stream layout is what makes SIMD decode fast: a single shuffle instruction can unpack 4–8 values at once, once the widths are known.

Delta encoding replaces each value with its difference from the previous one. For sequences where adjacent values are close — sorted data, slowly-drifting measurements, oscillating signals — the differences are much smaller than the raw values. Smaller values encode to fewer bytes.

Zigzag encoding maps signed integers to unsigned so that small absolute values stay small: 0→0, −1→1, 1→2, −2→3, 2→4. This matters when the data has signed deltas: without zigzag, a delta of −1 would encode as 4 bytes (0xFFFFFFFF) rather than 1. With zigzag it encodes as a single byte (0x01).

The three compose naturally: delta shrinks value magnitudes, zigzag keeps the result non-negative and compact, and StreamVByte encodes each small value as efficiently as possible. See the encoding guide for a full walkthrough.

Codec variants

Variant	Element	Byte widths	Notes
`Svb16`	`u16`	1/2	ONT VBZ format
`U32Classic`	`u32`	1/2/3/4	Lemire / C library compatible
`U32Variant0124`	`u32`	0/1/2/4	Better compression for sparse data
`U64Coder1234`	`u64`	1/2/3/4	Values up to `u32::MAX`
`U64Coder1248`	`u64`	1/2/4/8	Full u64 range

Installation

[dependencies]
svb = { version = "0.1", features = ["simd-auto"] }

Quick start

use svb::u32::U32Classic;

let values: Vec<u32> = vec![1, 500, 70_000, 16_000_000];
let encoded = U32Classic.encode(&values);
let decoded = U32Classic.decode(&encoded, values.len()).unwrap();
assert_eq!(decoded, values);

For the VBZ pipeline (Oxford Nanopore POD5 signal data):

use svb::{encode_vbz, decode_vbz};

let samples: Vec<i16> = vec![100, 101, 103, 102, 98];
let encoded = encode_vbz(&samples);
let decoded = decode_vbz(&encoded, samples.len()).unwrap();
assert_eq!(decoded, samples);

It's also pretty damn fast

Benchmarked with simd-auto on an Intel i7-11800H (AVX2), 8192-element slices:

Benchmark	svb	streamvbyte64
Svb16 encode	4.91 GB/s	—
Svb16 decode	4.51 GB/s	—
VBZ encode (delta + zigzag + SVB16)	3.14 GB/s	—
VBZ decode (3-pass)	1.88 GB/s	—
VBZ decode fused (single SIMD pass)	2.77 GB/s	—
VBZ2 decode fused (2-chain, single thread)	3.00 GB/s	—
U32Classic decode	4.07 GB/s	1.67 GB/s
U32Classic encode	2.08 GB/s	1.09 GB/s
U64Coder1248 decode	1.90 GB/s	1.32 GB/s
U64Coder1248 encode	1.25 GB/s	0.73 GB/s

VBZ is ~2.5x slower than SVB16 alone. Breaking down the pipeline (8192 i16 elements):

Stage	encode	decode
delta	11.02 GB/s	3.75 GB/s
zigzag	18.75 GB/s	14.83 GB/s
SVB16	4.91 GB/s	4.51 GB/s
VBZ combined (3-pass)	3.14 GB/s	1.88 GB/s
VBZ fused decode	—	2.77 GB/s

Around 2x faster on average than streamvbyte64 across all variants and sizes (range: 1.4x–2.7x). Full stage-by-stage breakdowns, fused decoder analysis, and VBZ-K parallel decode numbers are in the Performance docs.

If you run the benchmarks on another system (especially ARM with NEON) I'd love to see the results. Run:

cargo bench --features simd-auto

and open an issue or drop the output in.

MSRV

1.87 (edition 2024; SIMD intrinsics require target_feature_11, stabilised in 1.87).

svb 0.1.0

svb

Codec variants

Installation

Quick start

It's also pretty damn fast

MSRV

License