Crate dsi_bitstream

Expand description

§dsi-bitstream

license

A Rust implementation of bit streams supporting several types of instantaneous codes for compression.

This library mimics the behavior of the analogous classes in the DSI Utilities, but it aims at being much more flexible and (hopefully) efficient.

The two main traits are BitRead and BitWrite, associated with two main implementations BufBitReader and BufBitWriter. Additional traits make it possible to read and write instantaneous codes, like the exponential Golomb codes used in H.264 (MPEG-4) and H.265.

use dsi_bitstream::prelude::*;
// To write a bit stream, we need first a WordWrite around an output backend
// (in this case, a vector), which is word-based for efficiency.
// It could be a file, etc.
#[cfg(feature = "alloc")]
let mut word_write = MemWordWriterVec::new(Vec::<u64>::new());
#[cfg(not(feature = "alloc"))]
let mut word_write = MemWordWriterSlice::new([0_u64; 10]);
// Let us create a little-endian bit writer. The write word size will be inferred.
let mut writer = BufBitWriter::<LE, _>::new(word_write);
// Write 0 using 10 bits
writer.write_bits(0, 10)?;
// Write 0 in unary code
writer.write_unary(0)?;
// Write 1 in γ code
writer.write_gamma(1)?;
// Write 2 in δ code
writer.write_delta(2)?;
writer.flush()?;

// Let's recover the data
let data = writer.into_inner()?.into_inner();

// Reading back the data is similar, but since a reader has a bit buffer
// twice as large as the read word size, it is more efficient to use a
// u32 as read word, so we need to reinterpret the data.
let data = unsafe{data.align_to::<u32>().1};
let mut reader = BufBitReader::<LE, _>::new(MemWordReader::new_inf(data));
assert_eq!(reader.read_bits(10)?, 0);
assert_eq!(reader.read_unary()?, 0);
assert_eq!(reader.read_gamma()?, 1);
assert_eq!(reader.read_delta()?, 2);

In this case, the backend is already word-based, but if you have a byte-based backend such as a file, WordAdapter can be used to adapt it to a word-based backend.

You can also use references to backends instead of owned values, but this approach is less efficient:

use dsi_bitstream::prelude::*;
#[cfg(feature = "alloc")]
let mut word_write = MemWordWriterVec::new(Vec::<u64>::new());
#[cfg(not(feature = "alloc"))]
let mut word_write = MemWordWriterSlice::new([0_u64; 10]);
let mut writer = BufBitWriter::<LE, _>::new(word_write);
writer.write_bits(0, 10)?;
writer.write_unary(0)?;
writer.write_gamma(1)?;
writer.write_delta(2)?;
writer.flush()?;

// Let's recover the data
let data = writer.into_inner()?.into_inner();

// As in the example above, convert to u32 for better read performance
let data = unsafe{data.align_to::<u32>().1};
let mut reader = BufBitReader::<LE, _>::new(MemWordReader::new_inf(&data));
assert_eq!(reader.read_bits(10)?, 0);
assert_eq!(reader.read_unary()?, 0);
assert_eq!(reader.read_gamma()?, 1);
assert_eq!(reader.read_delta()?, 2);

Please read the documentation of the traits module and the impls module for more details.

§Options

There are a few options to modify the behavior of the bit read/write traits:

Endianness can be selected using the BE or LE types as the first parameter. The native endianness is usually the best choice, albeit sometimes the lack of some low-level instructions (first bit set, last bit, etc.) may make the non-native endianness more efficient.
Data is read from or written to the backend one word at a time, and the size of the word can be selected using the second parameter, but it must match the word size of the backend, so it is usually inferred. Currently, we suggest usize for writing and a type that is half of usize for reading.

More in-depth (and much more complicated) tuning can be obtained by modifying the default values for the parameters of instantaneous codes. Methods reading or writing instantaneous codes are defined in supporting traits and usually have const parameters, in particular, whether to use decoding tables or not (e.g., GammaReadParam::read_gamma_param). Such traits are implemented for BitRead/BitWrite. The only exception is unary code, which is implemented by BitRead::read_unary and BitWrite::write_unary.

However, there are traits with non-parametric methods (e.g., GammaRead::read_gamma) that are the standard entry points for the user. These traits are implemented for BufBitReader/BufBitWriter depending on a selector type implementing ReadParams/WriteParams, respectively. The default value for the parameter is DefaultReadParams/DefaultWriteParams, which uses choices we tested on several platforms and that we believe are good defaults, but by passing a different implementation of ReadParams/WriteParams you can change the default behavior. See params for more details.

Finally, if you choose to use tables, the size of the tables is hardwired in the source code (in particular, in the files *_tables.rs in the codes source directory) and can be changed only by regenerating the tables using the script gen_code_tables.py in the python directory. You will need to modify the values hardwired at the end of the script.

§Dispatching

We provide several options to dispatch codes dynamically.

§Benchmarks

To evaluate the performance on your hardware, you can run the benchmarks in the benches directory, which test the speed of read/write operations under several combinations of parameters.

Full table-size sweeps with plot generation are available via the Python scripts in the python directory (see benches/README.md for details). They can be used to choose whether to use tables on specific hardware, or to generate tables of different length. The svg directory contains reference results of these benchmarks on a few architectures.

§Features

checks: enables additional runtime checks on some parameters (in particular, written value must fit within the provided bit width).
std (default): enables standard library support, including WordAdapter and convenience functions such as from_path. Implies alloc.
alloc: enables heap allocation without full std (e.g., MemWordWriterVec). This feature is sufficient for no_std environments with a global allocator.
mem_dbg (default): derives MemDbg and MemSize from the mem_dbg crate on most structs, making it possible to inspect their heap memory usage.
serde (default): enables serde serialization and deserialization support for Codes and CodesStats.
implied: enables the implied module in utils, which provides sample_implied_distribution and get_implied_distribution. This feature pulls in the rand dependency and implies alloc.

§Testing

Besides unit tests, we provide zipped precomputed corpora generated by fuzzing. You can run the tests on the zipped precomputed corpora by enabling the fuzz feature:

cargo test --features fuzz

When the feature is enabled, tests will also be run on local corpora found in the top-level fuzz directory, if any are present.

§Acknowledgments

This software has been partially supported by project SERICS (PE00000014) under the NRRP MUR program funded by the EU - NGEU, and by project ANR COREGRAPHIE, grant ANR-20-CE23-0002 of the French Agence Nationale de la Recherche. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the Italian MUR. Neither the European Union nor the Italian MUR can be held responsible for them.

Modules§

codes: Traits for reading and writing instantaneous codes.
dispatch: Programmable static and dynamic dispatch for codes.
impls: Implementations of bit and word (seekable) streams.
prelude
traits: Traits for operating on streams of bits.
utils: Helpers and statistics.