//! Entropy Coding Primitives for Research and Production
//!
//! The `constriction` crate provides a set of composable entropy coding algorithms with a
//! focus on correctness, versatility, ease of use, compression performance, and
//! computational efficiency. The goals of `constriction` are to three-fold:
//!
//! 1. **to facilitate research on novel lossless and lossy compression methods** by
//! providing a *composable* set of entropy coding primitives rather than a rigid
//! implementation of a single preconfigured method;
//! 2. **to simplify the transition from research code to production software** by exposing
//! the exact same functionality via both a Python API (for rapid prototyping on research
//! code) and a Rust API (for turning successful prototypes into production); and
//! 3. **to serve as a teaching resource** by providing a wide range of entropy coding
//! algorithms within a single consistent framework, thus making the various algorithms
//! easily discoverable and comparable on example models and data. [Additional teaching
//! material](https://robamler.github.io/teaching/compress21/) is being made publicly
//! available as a by-product of an ongoing university course on data compression with
//! deep probabilistic models.
//!
//! For an example of a compression codec that started as research code in Python and was
//! then deployed as a fast and dependency-free WebAssembly module using `constriction`'s
//! Rust API, have a look at [The Linguistic Flux
//! Capacitor](https://robamler.github.io/linguistic-flux-capacitor).
//!
//! # Project Status
//!
//! We currently provide implementations of the following entropy coding algorithms:
//!
//! - **Asymmetric Numeral Systems (ANS):** a fast modern entropy coder with near-optimal
//! compression effectiveness that supports advanced use cases like bits-back coding.
//! - **Range Coding:** a computationally efficient variant of Arithmetic Coding, that has
//! essentially the same compression effectiveness as ANS Coding but operates as a queue
//! ("first in first out"), which makes it preferable for autoregressive models.
//! - **Chain Coding:** an experimental new entropy coder that combines the (net)
//! effectiveness of stream codes with the locality of symbol codes; it is meant for
//! experimental new compression approaches that perform joint inference, quantization,
//! and bits-back coding in an end-to-end optimization. This experimental coder is mainly
//! provided to prove to ourselves that the API for encoding and decoding, which is shared
//! across all stream coders, is flexible enough to express complex novel tasks.
//! - **Huffman Coding:** a well-known symbol code, mainly provided here for teaching
//! purpose; you'll usually want to use a stream code like ANS or Range Coding instead
//! since symbol codes can have a considerable overhead on the bitrate, especially in the
//! regime of low entropy per symbol, which is common in machine-learning based
//! compression methods.
//!
//! Further, `constriction` provides implementations of common probability distributions in
//! fixed-point arithmetic, which can be used as entropy models in either of the above
//! stream codes. The crate also provides adapters for turning custom probability
//! distributions into exactly invertible fixed-point arithmetic.
//!
//! The provided implementations of entropy coding algorithms and probability distributions
//! are extensively tested and should be considered reliable (except for the still
//! experimental Chain Coder). However, their APIs may change in future versions of
//! `constriction` if more user experience reveals any shortcomings of the current APIs in
//! terms of ergonomics. Please [file an
//! issue](https://github.com/bamler-lab/constriction/issues) if you run into a scenario
//! where the current APIs are suboptimal.
//!
//! # Quick Start With the Rust API
//!
//! You are currently reading the documentation of `constriction`'s Rust API. If Rust is not
//! your language of choice then head over to the [Python API
//! Documentation](https://bamler-lab.github.io/constriction/apidoc/python/). The Rust API
//! provides efficient and composable entropy coding primitives that can be adjusted to a
//! fine degree of detail using type parameters and const generics (type aliases with sane
//! defaults for all generic parameters are provided as a guidance). The Python API exposes
//! the most common use cases of these entropy coding primitives to an environment that
//! feels more natural to many data scientists.
//!
//! ## Setup
//!
//! To use `constriction` in your Rust project, just add the following line to the
//! `[dependencies]` section of your `Cargo.toml`:
//!
//! ```toml
//! [dependencies]
//! constriction = "0.3.0"
//! ```
//!
//! ## System Requirements
//!
//! `constriction` requires Rust version 1.51 or later for its use of the
//! `min_const_generics` feature. If you have an older version of Rust, update to the latest
//! version by running `rustup update stable`.
//!
//! ## Encoding Example
//!
//! In this example, we'll encode some symbols using a quantized Gaussian distribution as
//! entropy model. Each symbol will be modeled by a quantized Gaussian with a different
//! mean and standard deviation (so that the example is not too simplistic). We'll use the
//! `probability` crate for the Gaussian distributions, so also add the following dependency
//! to your `Cargo.toml`:
//!
//! ```toml
//! probability = "0.17"
//! ```
//!
//! Now, let's encode (i.e., compress) some symbols. We'll use an Asymmetric Numeral Systems
//! (ANS) Coder here for its speed and compression performance. We'll discuss how you could
//! replace the ANS Coder with a Range Coder or a symbol code like Huffman Coding
//! [below](#exercise).
//!
//! ```
//! use constriction::stream::{stack::DefaultAnsCoder, model::DefaultLeakyQuantizer};
//! use probability::distribution::Gaussian;
//!
//! fn encode_sample_data() -> Vec<u32> {
//! // Create an empty ANS Coder with default word and state size:
//! let mut coder = DefaultAnsCoder::new();
//!
//! // Some made up data and entropy models for demonstration purpose:
//! let symbols = [23i32, -15, 78, 43, -69];
//! let means = [35.2, -1.7, 30.1, 71.2, -75.1];
//! let stds = [10.1, 25.3, 23.8, 35.4, 3.9];
//!
//! // Create an adapter that integrates 1-d probability density functions over bins
//! // `[n - 0.5, n + 0.5)` for all integers `n` from `-100` to `100` using fixed point
//! // arithmetic with default precision, guaranteeing a nonzero probability for each bin:
//! let quantizer = DefaultLeakyQuantizer::new(-100..=100);
//!
//! // Encode the data (in reverse order, since ANS Coding operates as a stack):
//! coder.encode_symbols_reverse(
//! symbols.iter().zip(&means).zip(&stds).map(
//! |((&sym, &mean), &std)| (sym, quantizer.quantize(Gaussian::new(mean, std)))
//! )).unwrap();
//!
//! // Retrieve the compressed representation (filling it up to full words with zero bits).
//! coder.into_compressed().unwrap()
//! }
//!
//! assert_eq!(encode_sample_data(), [0x421C_7EC3, 0x000B_8ED1]);
//! ```
//!
//! ## Decoding Example
//!
//! Now, let's reconstruct the sample data from its compressed representation.
//!
//! ```
//! use constriction::stream::{stack::DefaultAnsCoder, model::DefaultLeakyQuantizer, Decode};
//! use probability::distribution::Gaussian;
//!
//! fn decode_sample_data(compressed: Vec<u32>) -> Vec<i32> {
//! // Create an ANS Coder with default word and state size from the compressed data:
//! // (ANS uses the same type for encoding and decoding, which makes the method very flexible
//! // and allows interleaving small encoding and decoding chunks, e.g., for bits-back coding.)
//! let mut coder = DefaultAnsCoder::from_compressed(compressed).unwrap();
//!
//! // Same entropy models and quantizer we used for encoding:
//! let means = [35.2, -1.7, 30.1, 71.2, -75.1];
//! let stds = [10.1, 25.3, 23.8, 35.4, 3.9];
//! let quantizer = DefaultLeakyQuantizer::new(-100..=100);
//!
//! // Decode the data:
//! coder.decode_symbols(
//! means.iter().zip(&stds).map(
//! |(&mean, &std)| quantizer.quantize(Gaussian::new(mean, std))
//! )).collect::<Result<Vec<_>, _>>().unwrap()
//! }
//!
//! assert_eq!(decode_sample_data(vec![0x421C_7EC3, 0x000B_8ED1]), [23, -15, 78, 43, -69]);
//! ```
//!
//! ## Exercise
//!
//! Try out the above examples and verify that decoding reconstructs the original data. Then
//! see how easy `constriction` makes it to replace the ANS Coder with a Range Coder by
//! making the following substitutions:
//!
//! **In the encoder,**
//!
//! - replace `constriction::stream::stack::DefaultAnsCoder` with
//! `constriction::stream::queue::DefaultRangeEncoder`; and
//! - replace `coder.encode_symbols_reverse` with `coder.encode_symbols` (you no longer need
//! to encode symbols in reverse order since Range Coding operates as a queue, i.e.,
//! first-in-first-out). You'll also have to add the line
//! `use constriction::stream::Encode;` to the top of the file to bring the trait method
//! `encode_symbols` into scope.
//!
//! **In the decoder,**
//!
//! - replace `constriction::stream::stack::DefaultAnsCoder` with
//! `constriction::stream::queue::DefaultRangeDecoder` (note that Range Coding
//! distinguishes between an encoder and a decoder type since the encoder writes to the
//! back while the decoder reads from the front; by contrast, ANS Coding is a stack, i.e.,
//! it reads and writes at the same position and allows interleaving reads and writes).
//!
//! *Remark:* You could also use a symbol code like Huffman Coding (see module [`symbol`])
//! but that would have considerably worse compression performance, especially on large
//! files, since symbol codes always emit an integer number of bits per compressed symbol,
//! even if the information content of the symbol is a fractional number (stream codes like
//! ANS and Range Coding *effectively* emit a fractional number of bits per symbol since
//! they amortize over several symbols).
//!
//! The above replacements should lead you to something like this:
//!
//! ```
//! use constriction::stream::{
//! model::DefaultLeakyQuantizer,
//! queue::{DefaultRangeEncoder, DefaultRangeDecoder},
//! Encode, Decode,
//! };
//! use probability::distribution::Gaussian;
//!
//! fn encode_sample_data() -> Vec<u32> {
//! // Create an empty Range Encoder with default word and state size:
//! let mut encoder = DefaultRangeEncoder::new();
//!
//! // Same made up data, entropy models, and quantizer as in the ANS Coding example above:
//! let symbols = [23i32, -15, 78, 43, -69];
//! let means = [35.2, -1.7, 30.1, 71.2, -75.1];
//! let stds = [10.1, 25.3, 23.8, 35.4, 3.9];
//! let quantizer = DefaultLeakyQuantizer::new(-100..=100);
//!
//! // Encode the data (this time in normal order, since Range Coding is a queue):
//! encoder.encode_symbols(
//! symbols.iter().zip(&means).zip(&stds).map(
//! |((&sym, &mean), &std)| (sym, quantizer.quantize(Gaussian::new(mean, std)))
//! )).unwrap();
//!
//! // Retrieve the (sealed up) compressed representation.
//! encoder.into_compressed().unwrap()
//! }
//!
//! fn decode_sample_data(compressed: Vec<u32>) -> Vec<i32> {
//! // Create a Range Decoder with default word and state size from the compressed data:
//! let mut decoder = DefaultRangeDecoder::from_compressed(compressed).unwrap();
//!
//! // Same entropy models and quantizer we used for encoding:
//! let means = [35.2, -1.7, 30.1, 71.2, -75.1];
//! let stds = [10.1, 25.3, 23.8, 35.4, 3.9];
//! let quantizer = DefaultLeakyQuantizer::new(-100..=100);
//!
//! // Decode the data:
//! decoder.decode_symbols(
//! means.iter().zip(&stds).map(
//! |(&mean, &std)| quantizer.quantize(Gaussian::new(mean, std))
//! )).collect::<Result<Vec<_>, _>>().unwrap()
//! }
//!
//! let compressed = encode_sample_data();
//!
//! // We'll get a different compressed representation than in the ANS Coding
//! // example because we're using a different entropy coding algorithm ...
//! assert_eq!(compressed, [0x1C31EFEB, 0x87B430DA]);
//!
//! // ... but as long as we decode with the matching algorithm we can still reconstruct the data:
//! assert_eq!(decode_sample_data(compressed), [23, -15, 78, 43, -69]);
//! ```
//!
//! # Where to Go Next?
//!
//! If you already have an entropy model and you just want to encode and decode some
//! sequence of symbols then you can probably start by adjusting the above
//! [examples](#encoding-example) to your needs. Or have a closer look at the [`stream`]
//! module.
//!
//! If you're still new to the concept of entropy coding then check out the [teaching
//! material](https://robamler.github.io/teaching/compress21/).
extern crate alloc;
extern crate std;
use ;
use ;
// READ WRITE SEMANTICS =======================================================
/// A trait for marking how reading and writing order relate to each other.
///
/// This is currently only used in the [`backends`] module. Future versions of
/// `constriction` may expand its use to frontends.
/// Zero sized marker trait for last-in-first-out read/write [`Semantics`]
///
/// This type typically only comes up in advanced use cases that are generic over read/write
/// semantics. If you are looking for an entropy coder that operates as a stack, check out
/// the module [`stream::stack`].
/// Zero sized marker trait for first-in-first-out read/write [`Semantics`]
///
/// This type typically only comes up in advanced use cases that are generic over read/write
/// semantics. If you are looking for an entropy coder that operates as a queue, check out
/// the module [`stream::queue`].
// GENERIC ERROR TYPES ========================================================
type DefaultEncoderError<BackendError> = ;
/// Trait for coders or backends that *might* implement [`Pos`] and/or [`Seek`]
///
/// If a type implements `PosSeek` then that doesn't necessarily mean that it also
/// implements [`Pos`] or [`Seek`]. Implementing `PosSeek` only fixes the common `Position`
/// type that is used *if* the type implements `Pos` and/or `Seek`.
/// A trait for entropy coders that keep track of their current position within the
/// compressed data.
///
/// This is the counterpart of [`Seek`]. Call [`Pos::pos`] to record
/// "snapshots" of an entropy coder, and then call [`Seek::seek`] at a later time
/// to jump back to these snapshots. See examples in the documentations of [`Seek`]
/// and [`Seek::seek`].
/// A trait for entropy coders that support random access.
///
/// This is the counterpart of [`Pos`]. While [`Pos::pos`] can be used to
/// record "snapshots" of an entropy coder, [`Seek::seek`] can be used to jump to these
/// recorded snapshots.
///
/// Not all entropy coders that implement `Pos` also implement `Seek`. For example,
/// [`DefaultAnsCoder`] implements `Pos` but it doesn't implement `Seek` because it
/// supports both encoding and decoding and therefore always operates at the head. In
/// such a case one can usually obtain a seekable entropy coder in return for
/// surrendering some other property. For example, `DefaultAnsCoder` provides the methods
/// [`as_seekable_decoder`] and [`into_seekable_decoder`] that return a decoder which
/// implements `Seek` but which can no longer be used for encoding (i.e., it doesn't
/// implement [`Encode`]).
///
/// # Example
///
/// ```
/// use constriction::stream::{
/// model::DefaultContiguousCategoricalEntropyModel, stack::DefaultAnsCoder, Decode
/// };
/// use constriction::{Pos, Seek};
///
/// // Create a `AnsCoder` encoder and an entropy model:
/// let mut ans = DefaultAnsCoder::new();
/// let probabilities = vec![0.03, 0.07, 0.1, 0.1, 0.2, 0.2, 0.1, 0.15, 0.05];
/// let entropy_model = DefaultContiguousCategoricalEntropyModel
/// ::from_floating_point_probabilities(&probabilities).unwrap();
///
/// // Encode some symbols in two chunks and take a snapshot after each chunk.
/// let symbols1 = vec![8, 2, 0, 7];
/// ans.encode_iid_symbols_reverse(&symbols1, &entropy_model).unwrap();
/// let snapshot1 = ans.pos();
///
/// let symbols2 = vec![3, 1, 5];
/// ans.encode_iid_symbols_reverse(&symbols2, &entropy_model).unwrap();
/// let snapshot2 = ans.pos();
///
/// // As discussed above, `DefaultAnsCoder` doesn't impl `Seek` but we can get a decoder that does:
/// let mut seekable_decoder = ans.as_seekable_decoder();
///
/// // `seekable_decoder` is still a `AnsCoder`, so decoding would start with the items we encoded
/// // last. But since it implements `Seek` we can jump ahead to our first snapshot:
/// seekable_decoder.seek(snapshot1);
/// let decoded1 = seekable_decoder
/// .decode_iid_symbols(4, &entropy_model)
/// .collect::<Result<Vec<_>, _>>()
/// .unwrap();
/// assert_eq!(decoded1, symbols1);
///
/// // We've reached the end of the compressed data ...
/// assert!(seekable_decoder.is_empty());
///
/// // ... but we can still jump to somewhere else and continue decoding from there:
/// seekable_decoder.seek(snapshot2);
///
/// // Creating snapshots didn't mutate the coder, so we can just decode through `snapshot1`:
/// let decoded_both = seekable_decoder.decode_iid_symbols(7, &entropy_model).map(Result::unwrap);
/// assert!(decoded_both.eq(symbols2.into_iter().chain(symbols1)));
/// assert!(seekable_decoder.is_empty()); // <-- We've reached the end again.
/// ```
///
/// [`Encode`]: stream::Encode
/// [`DefaultAnsCoder`]: stream::stack::DefaultAnsCoder
/// [`as_seekable_decoder`]: stream::stack::AnsCoder::as_seekable_decoder
/// [`into_seekable_decoder`]: stream::stack::AnsCoder::into_seekable_decoder
/// A trait for bit strings of fixed (and usually small) length.
///
/// Short fixed-length bit strings are fundamental building blocks of efficient entropy
/// coding algorithms. They are currently used for the following purposes:
/// - to represent the smallest unit of compressed data (see [`stream::Code::Word`]);
/// - to represent probabilities in fixed point arithmetic (see
/// [`stream::model::EntropyModel::Probability`]); and
/// - the internal state of entropy coders (see [`stream::Code::State`]) is typically comprised of
/// one or more `BitArray`s, although this is not a requirement.
///
/// This trait is implemented on all primitive unsigned integer types. It is not recommended
/// to implement this trait for custom types since coders will assume, for performance
/// considerations, that `BitArray`s can be represented and manipulated efficiently in
/// hardware.
///
/// # Safety
///
/// This trait is marked `unsafe` so that entropy coders may rely on the assumption that all
/// `BitArray`s have precisely the same behavior as builtin unsigned integer types, and that
/// [`BitArray::BITS`] has the correct value.
pub unsafe
/// A trait for bit strings like [`BitArray`] but with guaranteed nonzero values
///
/// # Safety
///
/// Must guarantee that the value is indeed nonzero. Failing to do so could, e.g., cause a
/// division by zero in entropy coders, which is undefined behavior.
pub unsafe
/// Iterates from most significant to least significant bits in chunks but skips any
/// initial zero chunks.
+ ExactSizeIterator + DoubleEndedIterator
where
Data: BitArray + ,
Chunk: BitArray,
unsafe
}
)+
};
}
unsafe_impl_bit_array!;
unsafe_impl_bit_array!;