stream_vbyte/
lib.rs

1//! Encode `u32`s to bytes and decode them back again with the Stream VByte format.
2//!
3//! To encode all your numbers to a `&[u8]`, or decode all your bytes to a `&[u32]`, see `encode()`
4//! and `decode()` respectively. For more sophisticated decoding functionality, see `DecodeCursor`.
5//!
6//! There are two traits, `Encoder` and `Decoder`, that allow you to choose what logic to use in the
7//! inner hot loops.
8//!
9//! A terminology note - Stream VByte groups encoded numbers into clusters of four, which are
10//! referred to as "quads" in this project.
11//!
12//! # The simple, pretty fast way
13//!
14//! Use `Scalar` for your `Encoder` and `Decoder`. It will work on all hardware, and is fast enough
15//! that most people will probably never notice the time taken to encode/decode.
16//!
17//! # The more complex, really fast way
18//!
19//! If you can use nightly Rust (currently needed for SIMD) and you know which hardware you'll be
20//! running on, or you can add runtime detection of CPU features, you can choose to use an
21//! implementation that takes advantage of your hardware. Something like
22//! [raw-cpuid](https://crates.io/crates/raw-cpuid) or [auxv](https://crates.io/crates/auxv) will
23//! probably be useful for runtime CPU feature detection.
24//!
25//! Performance numbers are calculated on an E5-1650v3 on encoding/decoding 1 million random numbers
26//! at a time. You can run the benchmarks yourself to see how your hardware does.
27//!
28//! Both `feature`s and `target_feature`s are used because `target_feature` is not in stable Rust
29//! yet and this library should remain usable by stable Rust, so non-stable-friendly things are
30//! hidden behind `feature`s.
31//!
32//! ## Encoders
33//!
34//! | Type           | Performance    | Hardware                                 | `target_feature` | `feature`   |
35//! | -------------- | ---------------| ---------------------------------------- | ---------------- | ----------- |
36//! | `Scalar`       | ≈140 million/s | All                                      | none             | none        |
37//! | `x86::Sse41`   | ≈1 billion/s   | x86 with SSE4.1 (Penryn and above, 2008) | `sse4.1`         | `x86_sse41` |
38//!
39//! ## Decoders
40//!
41//! | Type           | Performance    | Hardware                                   | `target_feature` | `feature`   |
42//! | -------------- | ---------------| ------------------------------------------ | ---------------- | ----------- |
43//! | `Scalar`       | ≈140 million/s | All                                        | none             | none        |
44//! | `x86::Ssse3`   | ≈2.7 billion/s | x86 with SSSE3 (Woodcrest and above, 2006) | `ssse3`          | `x86_ssse3` |
45//!
46//! If you have a modern x86 and you want to use the all SIMD accelerated versions, you would use
47//! `target_feature` in a compiler invocation like this:
48//!
49//! ```sh
50//! RUSTFLAGS='-C target-feature=+ssse3,+sse4.1' cargo ...
51//! ```
52//!
53//! Meanwhile, `feature`s for your dependency on this crate are specified
54//! [in your project's Cargo.toml](http://doc.crates.io/manifest.html#the-features-section).
55//!
56//! # Examples
57//!
58//! Encode some numbers to bytes, then decode them in different ways.
59//!
60//! ```
61//! use stream_vbyte::{
62//!     encode::encode,
63//!     decode::{decode, cursor::DecodeCursor},
64//!     scalar::Scalar
65//! };
66//!
67//! let nums: Vec<u32> = (0..12_345).collect();
68//! let mut encoded_data = Vec::new();
69//! // make some space to encode into
70//! encoded_data.resize(5 * nums.len(), 0x0);
71//!
72//! // use Scalar implementation that works on any hardware
73//! let encoded_len = encode::<Scalar>(&nums, &mut encoded_data);
74//! println!("Encoded {} u32s into {} bytes", nums.len(), encoded_len);
75//!
76//! // decode all the numbers at once
77//! let mut decoded_nums = Vec::new();
78//! decoded_nums.resize(nums.len(), 0);
79//! let bytes_decoded = decode::<Scalar>(&encoded_data, nums.len(), &mut decoded_nums);
80//! assert_eq!(nums, decoded_nums);
81//! assert_eq!(encoded_len, bytes_decoded);
82//!
83//! // or maybe you want to skip some of the numbers while decoding
84//! decoded_nums.clear();
85//! decoded_nums.resize(nums.len(), 0);
86//! let mut cursor = DecodeCursor::new(&encoded_data, nums.len());
87//! cursor.skip(10_000);
88//! let count = cursor.decode_slice::<Scalar>(&mut decoded_nums);
89//! assert_eq!(12_345 - 10_000, count);
90//! assert_eq!(&nums[10_000..], &decoded_nums[0..count]);
91//! assert_eq!(encoded_len, cursor.input_consumed());
92//! ```
93//!
94//! # Panics
95//!
96//! If you use undersized slices (e.g. encoding 10 numbers into 5 bytes), you will get the normal
97//! slice bounds check panics.
98//!
99//! # Safety
100//!
101//! SIMD code uses unsafe internally because many of the SIMD intrinsics are unsafe. However, SIMD
102//! intrinsics are used only on appropriately sized slices to essentially manually apply
103//! slice index checking before use.
104//!
105//! Since this is human-maintained code, it could do the bounds checking incorrectly, of course. To
106//! mitigate those risks, there are various forms of randomized testing in the test suite to shake
107//! out any lurking bugs.
108//!
109//! The `Scalar` codec does not use unsafe.
110
111#![cfg_attr(
112    any(feature = "x86_ssse3", feature = "x86_sse41"),
113    feature(portable_simd)
114)]
115
116mod tables;
117
118pub mod decode;
119pub mod encode;
120
121pub mod scalar;
122pub mod x86;
123
124#[cfg(test)]
125mod random_varint;
126
127#[derive(Debug, PartialEq)]
128struct EncodedShape {
129    control_bytes_len: usize,
130    complete_control_bytes_len: usize,
131    leftover_numbers: usize,
132}
133
134fn encoded_shape(count: usize) -> EncodedShape {
135    EncodedShape {
136        control_bytes_len: (count + 3) / 4,
137        complete_control_bytes_len: count / 4,
138        leftover_numbers: count % 4,
139    }
140}
141
142fn cumulative_encoded_len(control_bytes: &[u8]) -> usize {
143    // sum could only overflow with an invalid encoding because the sum can be no larger than
144    // the complete length of the encoded data, which fits in a usize
145    control_bytes
146        .iter()
147        .map(|&b| tables::DECODE_LENGTH_PER_QUAD_TABLE[b as usize] as usize)
148        .sum()
149}
150
151#[cfg(test)]
152mod tests;