constriction/
lib.rs

1//! Entropy Coding Primitives for Research and Production
2//!
3//! The `constriction` crate provides a set of composable entropy coding algorithms with a
4//! focus on correctness, versatility, ease of use, compression performance, and
5//! computational efficiency. The goals of `constriction` are to three-fold:
6//!
7//! 1. **to facilitate research on novel lossless and lossy compression methods** by
8//!    providing a *composable* set of entropy coding primitives rather than a rigid
9//!    implementation of a single preconfigured method;
10//! 2. **to simplify the transition from research code to production software** by exposing
11//!    the exact same functionality via both a Python API (for rapid prototyping on research
12//!    code) and a Rust API (for turning successful prototypes into production); and
13//! 3. **to serve as a teaching resource** by providing a wide range of entropy coding
14//!    algorithms within a single consistent framework, thus making the various algorithms
15//!    easily discoverable and comparable on example models and data. [Additional teaching
16//!    material](https://robamler.github.io/teaching/compress21/) is being made publicly
17//!    available as a by-product of an ongoing university course on data compression with
18//!    deep probabilistic models.
19//!
20//! For an example of a compression codec that started as research code in Python and was
21//! then deployed as a fast and dependency-free WebAssembly module using `constriction`'s
22//! Rust API, have a look at [The Linguistic Flux
23//! Capacitor](https://robamler.github.io/linguistic-flux-capacitor).
24//!
25//! # Project Status
26//!
27//! We currently provide implementations of the following entropy coding algorithms:
28//!
29//! - **Asymmetric Numeral Systems (ANS):** a fast modern entropy coder with near-optimal
30//!   compression effectiveness that supports advanced use cases like bits-back coding.
31//! - **Range Coding:** a computationally efficient variant of Arithmetic Coding, that has
32//!   essentially the same compression effectiveness as ANS Coding but operates as a queue
33//!   ("first in first out"), which makes it preferable for autoregressive models.
34//! - **Chain Coding:** an experimental new entropy coder that combines the (net)
35//!   effectiveness of stream codes with the locality of symbol codes; it is meant for
36//!   experimental new compression approaches that perform joint inference, quantization,
37//!   and bits-back coding in an end-to-end optimization. This experimental coder is mainly
38//!   provided to prove to ourselves that the API for encoding and decoding, which is shared
39//!   across all stream coders, is flexible enough to express complex novel tasks.
40//! - **Huffman Coding:** a well-known symbol code, mainly provided here for teaching
41//!   purpose; you'll usually want to use a stream code like ANS or Range Coding instead
42//!   since symbol codes can have a considerable overhead on the bitrate, especially in the
43//!   regime of low entropy per symbol, which is common in machine-learning based
44//!   compression methods.
45//!
46//! Further, `constriction` provides implementations of common probability distributions in
47//! fixed-point arithmetic, which can be used as entropy models in either of the above
48//! stream codes. The crate also provides adapters for turning custom probability
49//! distributions into exactly invertible fixed-point arithmetic.
50//!
51//! The provided implementations of entropy coding algorithms and probability distributions
52//! are extensively tested and should be considered reliable (except for the still
53//! experimental Chain Coder). However, their APIs may change in future versions of
54//! `constriction` if more user experience reveals any shortcomings of the current APIs in
55//! terms of ergonomics. Please [file an
56//! issue](https://github.com/bamler-lab/constriction/issues) if you run into a scenario
57//! where the current APIs are suboptimal.
58//!
59//! # Quick Start With the Rust API
60//!
61//! You are currently reading the documentation of `constriction`'s Rust API. If Rust is not
62//! your language of choice then head over to the [Python API
63//! Documentation](https://bamler-lab.github.io/constriction/apidoc/python/). The Rust API
64//! provides efficient and composable entropy coding primitives that can be adjusted to a
65//! fine degree of detail using type parameters and const generics (type aliases with sane
66//! defaults for all generic parameters are provided as a guidance). The Python API exposes
67//! the most common use cases of these entropy coding primitives to an environment that
68//! feels more natural to many data scientists.
69//!
70//! ## Setup
71//!
72//! To use `constriction` in your Rust project, just add the following line to the
73//! `[dependencies]` section of your `Cargo.toml`:
74//!
75//! ```toml
76//! [dependencies]
77//! constriction = "0.4.1"
78//! ```
79//!
80//! ## System Requirements
81//!
82//! `constriction` requires Rust version 1.51 or later for its use of the
83//! `min_const_generics` feature. If you have an older version of Rust, update to the latest
84//! version by running `rustup update stable`.
85//!
86//! ## Encoding Example
87//!
88//! In this example, we'll encode some symbols using a quantized Gaussian distribution as
89//! entropy model. Each symbol will be modeled by a quantized Gaussian  with a different
90//! mean and standard deviation (so that the example is not too simplistic). We'll use the
91//! `probability` crate for the Gaussian distributions, so also add the following dependency
92//! to your `Cargo.toml`:
93//!
94//! ```toml
95//! probability = "0.20"
96//! ```
97//!
98//! Now, let's encode (i.e., compress) some symbols. We'll use an Asymmetric Numeral Systems
99//! (ANS) Coder here for its speed and compression performance. We'll discuss how you could
100//! replace the ANS Coder with a Range Coder or a symbol code like Huffman Coding
101//! [below](#exercise).
102//!
103//! ```
104//! use constriction::stream::{stack::DefaultAnsCoder, model::DefaultLeakyQuantizer};
105//! use probability::distribution::Gaussian;
106//!
107//! fn encode_sample_data() -> Vec<u32> {
108//!     // Create an empty ANS Coder with default word and state size:
109//!     let mut coder = DefaultAnsCoder::new();
110//!
111//!     // Some made up data and entropy models for demonstration purpose:
112//!     let symbols = [23i32, -15, 78, 43, -69];
113//!     let means = [35.2, -1.7, 30.1, 71.2, -75.1];
114//!     let stds = [10.1, 25.3, 23.8, 35.4, 3.9];
115//!
116//!     // Create an adapter that integrates 1-d probability density functions over bins
117//!     // `[n - 0.5, n + 0.5)` for all integers `n` from `-100` to `100` using fixed point
118//!     // arithmetic with default precision, guaranteeing a nonzero probability for each bin:
119//!     let quantizer = DefaultLeakyQuantizer::new(-100..=100);
120//!
121//!     // Encode the data (in reverse order, since ANS Coding operates as a stack):
122//!     coder.encode_symbols_reverse(
123//!         symbols.iter().zip(&means).zip(&stds).map(
124//!             |((&sym, &mean), &std)| (sym, quantizer.quantize(Gaussian::new(mean, std)))
125//!     )).unwrap();
126//!
127//!     // Retrieve the compressed representation (filling it up to full words with zero bits).
128//!     coder.into_compressed().unwrap()
129//! }
130//!
131//! assert_eq!(encode_sample_data(), [0x421C_7EC3, 0x000B_8ED1]);
132//! ```
133//!
134//! ## Decoding Example
135//!
136//! Now, let's reconstruct the sample data from its compressed representation.
137//!
138//! ```
139//! use constriction::stream::{stack::DefaultAnsCoder, model::DefaultLeakyQuantizer, Decode};
140//! use probability::distribution::Gaussian;
141//!
142//! fn decode_sample_data(compressed: Vec<u32>) -> Vec<i32> {
143//!     // Create an ANS Coder with default word and state size from the compressed data:
144//!     // (ANS uses the same type for encoding and decoding, which makes the method very flexible
145//!     // and allows interleaving small encoding and decoding chunks, e.g., for bits-back coding.)
146//!     let mut coder = DefaultAnsCoder::from_compressed(compressed).unwrap();
147//!
148//!     // Same entropy models and quantizer we used for encoding:
149//!     let means = [35.2, -1.7, 30.1, 71.2, -75.1];
150//!     let stds = [10.1, 25.3, 23.8, 35.4, 3.9];
151//!     let quantizer = DefaultLeakyQuantizer::new(-100..=100);
152//!
153//!     // Decode the data:
154//!     coder.decode_symbols(
155//!         means.iter().zip(&stds).map(
156//!             |(&mean, &std)| quantizer.quantize(Gaussian::new(mean, std))
157//!     )).collect::<Result<Vec<_>, _>>().unwrap()
158//! }
159//!
160//! assert_eq!(decode_sample_data(vec![0x421C_7EC3, 0x000B_8ED1]), [23, -15, 78, 43, -69]);
161//! ```
162//!
163//! ## Exercise
164//!
165//! Try out the above examples and verify that decoding reconstructs the original data. Then
166//! see how easy `constriction` makes it to replace the ANS Coder with a Range Coder by
167//! making the following substitutions:
168//!
169//! **In the encoder,**
170//!
171//! - replace `constriction::stream::stack::DefaultAnsCoder` with
172//!   `constriction::stream::queue::DefaultRangeEncoder`; and
173//! - replace `coder.encode_symbols_reverse` with `coder.encode_symbols` (you no longer need
174//!   to encode symbols in reverse order since Range Coding operates as a queue, i.e.,
175//!   first-in-first-out). You'll also have to add the line
176//!   `use constriction::stream::Encode;` to the top of the file to bring the trait method
177//!   `encode_symbols` into scope.
178//!
179//! **In the decoder,**
180//!
181//! - replace `constriction::stream::stack::DefaultAnsCoder` with
182//!   `constriction::stream::queue::DefaultRangeDecoder` (note that Range Coding
183//!   distinguishes between an encoder and a decoder type since the encoder writes to the
184//!   back while the decoder reads from the front; by contrast, ANS Coding is a stack, i.e.,
185//!   it reads and writes at the same position and allows interleaving reads and writes).
186//!
187//! *Remark:* You could also use a symbol code like Huffman Coding (see module [`symbol`])
188//! but that would have considerably worse compression performance, especially on large
189//! files, since symbol codes always emit an integer number of bits per compressed symbol,
190//! even if the information content of the symbol is a fractional number (stream codes like
191//! ANS and Range Coding *effectively* emit a fractional number of bits per symbol since
192//! they amortize over several symbols).
193//!
194//! The above replacements should lead you to something like this:
195//!
196//! ```
197//! use constriction::stream::{
198//!     model::DefaultLeakyQuantizer,
199//!     queue::{DefaultRangeEncoder, DefaultRangeDecoder},
200//!     Encode, Decode,
201//! };
202//! use probability::distribution::Gaussian;
203//!
204//! fn encode_sample_data() -> Vec<u32> {
205//!     // Create an empty Range Encoder with default word and state size:
206//!     let mut encoder = DefaultRangeEncoder::new();
207//!
208//!     // Same made up data, entropy models, and quantizer as in the ANS Coding example above:
209//!     let symbols = [23i32, -15, 78, 43, -69];
210//!     let means = [35.2, -1.7, 30.1, 71.2, -75.1];
211//!     let stds = [10.1, 25.3, 23.8, 35.4, 3.9];
212//!     let quantizer = DefaultLeakyQuantizer::new(-100..=100);
213//!
214//!     // Encode the data (this time in normal order, since Range Coding is a queue):
215//!     encoder.encode_symbols(
216//!         symbols.iter().zip(&means).zip(&stds).map(
217//!             |((&sym, &mean), &std)| (sym, quantizer.quantize(Gaussian::new(mean, std)))
218//!     )).unwrap();
219//!
220//!     // Retrieve the (sealed up) compressed representation.
221//!     encoder.into_compressed().unwrap()
222//! }
223//!
224//! fn decode_sample_data(compressed: Vec<u32>) -> Vec<i32> {
225//!     // Create a Range Decoder with default word and state size from the compressed data:
226//!     let mut decoder = DefaultRangeDecoder::from_compressed(compressed).unwrap();
227//!
228//!     // Same entropy models and quantizer we used for encoding:
229//!     let means = [35.2, -1.7, 30.1, 71.2, -75.1];
230//!     let stds = [10.1, 25.3, 23.8, 35.4, 3.9];
231//!     let quantizer = DefaultLeakyQuantizer::new(-100..=100);
232//!
233//!     // Decode the data:
234//!     decoder.decode_symbols(
235//!         means.iter().zip(&stds).map(
236//!             |(&mean, &std)| quantizer.quantize(Gaussian::new(mean, std))
237//!     )).collect::<Result<Vec<_>, _>>().unwrap()
238//! }
239//!
240//! let compressed = encode_sample_data();
241//!
242//! // We'll get a different compressed representation than in the ANS Coding
243//! // example because we're using a different entropy coding algorithm ...
244//! assert_eq!(compressed, [0x1C31EFEB, 0x87B430DA]);
245//!
246//! // ... but as long as we decode with the matching algorithm we can still reconstruct the data:
247//! assert_eq!(decode_sample_data(compressed), [23, -15, 78, 43, -69]);
248//! ```
249//!
250//! # Where to Go Next?
251//!
252//! If you already have an entropy model and you just want to encode and decode some
253//! sequence of symbols then you can probably start by adjusting the above
254//! [examples](#encoding-example) to your needs. Or have a closer look at the [`stream`]
255//! module.
256//!
257//! If you're still new to the concept of entropy coding then check out the [teaching
258//! material](https://robamler.github.io/teaching/compress21/).
259
260#![no_std]
261#![warn(rust_2018_idioms, missing_debug_implementations)]
262
263extern crate alloc;
264
265#[cfg(feature = "std")]
266extern crate std;
267
268#[cfg(feature = "pybindings")]
269mod pybindings;
270
271pub mod backends;
272pub mod stream;
273pub mod symbol;
274
275use core::{
276    convert::Infallible,
277    fmt::{Binary, Debug, Display, LowerHex, UpperHex},
278    hash::Hash,
279    num::{NonZeroU16, NonZeroU32, NonZeroU64, NonZeroU8, NonZeroUsize},
280};
281
282use num_traits::{AsPrimitive, PrimInt, Unsigned, WrappingAdd, WrappingMul, WrappingSub};
283
284// READ WRITE SEMANTICS =======================================================
285
286/// A trait for marking how reading and writing order relate to each other.
287///
288/// This is currently only used in the [`backends`] module. Future versions of
289/// `constriction` may expand its use to frontends.
290pub trait Semantics: Default {}
291
292/// Zero sized marker struct for last-in-first-out read/write [`Semantics`]
293///
294/// This type typically only comes up in advanced use cases that are generic over read/write
295/// semantics. If you are looking for an entropy coder that operates as a stack, check out
296/// the module [`stream::stack`].
297#[derive(Debug, Default)]
298pub struct Stack {}
299impl Semantics for Stack {}
300
301/// Zero sized marker struct for first-in-first-out read/write [`Semantics`]
302///
303/// This type typically only comes up in advanced use cases that are generic over read/write
304/// semantics. If you are looking for an entropy coder that operates as a queue, check out
305/// the module [`stream::queue`].
306#[derive(Debug, Default)]
307pub struct Queue {}
308impl Semantics for Queue {}
309
310// GENERIC ERROR TYPES ========================================================
311
312#[derive(Debug, Clone, PartialEq, Eq)]
313pub enum CoderError<FrontendError, BackendError> {
314    Frontend(FrontendError),
315    Backend(BackendError),
316}
317
318impl<FrontendError, BackendError> CoderError<FrontendError, BackendError> {
319    pub fn map_frontend<E>(self, f: impl Fn(FrontendError) -> E) -> CoderError<E, BackendError> {
320        match self {
321            Self::Frontend(err) => CoderError::Frontend(f(err)),
322            Self::Backend(err) => CoderError::Backend(err),
323        }
324    }
325
326    pub fn map_backend<E>(self, f: impl Fn(BackendError) -> E) -> CoderError<FrontendError, E> {
327        match self {
328            Self::Backend(err) => CoderError::Backend(f(err)),
329            Self::Frontend(err) => CoderError::Frontend(err),
330        }
331    }
332}
333
334impl<BackendError: Display, FrontendError: Display> Display
335    for CoderError<FrontendError, BackendError>
336{
337    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
338        match self {
339            Self::Frontend(err) => write!(f, "Invalid compressed data: {err}"),
340            Self::Backend(err) => write!(f, "Error while reading compressed data: {err}"),
341        }
342    }
343}
344
345#[cfg(feature = "std")]
346impl<FrontendError: std::error::Error + 'static, BackendError: std::error::Error + 'static>
347    std::error::Error for CoderError<FrontendError, BackendError>
348{
349    fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
350        match self {
351            Self::Frontend(source) => Some(source),
352            Self::Backend(source) => Some(source),
353        }
354    }
355}
356
357impl<FrontendError, BackendError> From<BackendError> for CoderError<FrontendError, BackendError> {
358    fn from(read_error: BackendError) -> Self {
359        Self::Backend(read_error)
360    }
361}
362
363impl<FrontendError> CoderError<FrontendError, Infallible> {
364    fn into_frontend_error(self) -> FrontendError {
365        #[allow(unreachable_patterns)]
366        match self {
367            CoderError::Frontend(frontend_error) => frontend_error,
368            CoderError::Backend(infallible) => match infallible {},
369        }
370    }
371}
372
373type DefaultEncoderError<BackendError> = CoderError<DefaultEncoderFrontendError, BackendError>;
374
375#[derive(Debug, Clone, PartialEq, Eq)]
376pub enum DefaultEncoderFrontendError {
377    /// Tried to encode a symbol with zero probability under the used entropy model.
378    ///
379    /// This error can usually be avoided by using a "leaky" distribution, as the
380    /// entropy model, i.e., a distribution that assigns a nonzero probability to all
381    /// symbols within a finite domain. Leaky distributions can be constructed with,
382    /// e.g., a [`LeakyQuantizer`](models/struct.LeakyQuantizer.html) or with
383    /// [`LeakyCategorical::from_floating_point_probabilities`](
384    /// models/struct.LeakyCategorical.html#method.from_floating_point_probabilities).
385    ImpossibleSymbol,
386}
387
388impl Display for DefaultEncoderFrontendError {
389    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
390        match self {
391            Self::ImpossibleSymbol => write!(
392                f,
393                "Tried to encode symbol that has zero probability under the used entropy model."
394            ),
395        }
396    }
397}
398
399#[cfg(feature = "std")]
400impl std::error::Error for DefaultEncoderFrontendError {}
401
402impl DefaultEncoderFrontendError {
403    #[inline(always)]
404    const fn into_coder_error<BackendError>(self) -> DefaultEncoderError<BackendError> {
405        DefaultEncoderError::Frontend(self)
406    }
407}
408
409/// Trait for coders or backends that *might* implement [`Pos`] and/or [`Seek`]
410///
411/// If a type implements `PosSeek` then that doesn't necessarily mean that it also
412/// implements [`Pos`] or [`Seek`]. Implementing `PosSeek` only fixes the common `Position`
413/// type that is used *if* the type implements `Pos` and/or `Seek`.
414pub trait PosSeek {
415    type Position: Clone;
416}
417
418/// A trait for entropy coders that keep track of their current position within the
419/// compressed data.
420///
421/// This is the counterpart of [`Seek`]. Call [`Pos::pos`] to record
422/// "snapshots" of an entropy coder, and then call [`Seek::seek`] at a later time
423/// to jump back to these snapshots. See examples in the documentations of [`Seek`]
424/// and [`Seek::seek`].
425pub trait Pos: PosSeek {
426    /// Returns the position in the compressed data, in units of `Word`s.
427    ///
428    /// It is up to the entropy coder to define what constitutes the beginning and end
429    /// positions within the compressed data (for example, a [`AnsCoder`] begins encoding
430    /// at position zero but it begins decoding at position `ans.buf().len()`).
431    ///
432    /// [`AnsCoder`]: stream::stack::AnsCoder
433    fn pos(&self) -> Self::Position;
434}
435
436/// A trait for entropy coders that support random access.
437///
438/// This is the counterpart of [`Pos`]. While [`Pos::pos`] can be used to
439/// record "snapshots" of an entropy coder, [`Seek::seek`] can be used to jump to these
440/// recorded snapshots.
441///
442/// Not all entropy coders that implement `Pos` also implement `Seek`. For example,
443/// [`DefaultAnsCoder`] implements `Pos` but it doesn't implement `Seek` because it
444/// supports both encoding and decoding and therefore always operates at the head. In
445/// such a case one can usually obtain a seekable entropy coder in return for
446/// surrendering some other property. For example, `DefaultAnsCoder` provides the methods
447/// [`as_seekable_decoder`] and [`into_seekable_decoder`] that return a decoder which
448/// implements `Seek` but which can no longer be used for encoding (i.e., it doesn't
449/// implement [`Encode`]).
450///
451/// # Example
452///
453/// ```
454/// use constriction::stream::{
455///     model::DefaultContiguousCategoricalEntropyModel, stack::DefaultAnsCoder, Decode
456/// };
457/// use constriction::{Pos, Seek};
458///
459/// // Create a `AnsCoder` encoder and an entropy model:
460/// let mut ans = DefaultAnsCoder::new();
461/// let probabilities = vec![0.03, 0.07, 0.1, 0.1, 0.2, 0.2, 0.1, 0.15, 0.05];
462/// let entropy_model = DefaultContiguousCategoricalEntropyModel
463///     ::from_floating_point_probabilities_fast(&probabilities, None).unwrap();
464///
465/// // Encode some symbols in two chunks and take a snapshot after each chunk.
466/// let symbols1 = vec![8, 2, 0, 7];
467/// ans.encode_iid_symbols_reverse(&symbols1, &entropy_model).unwrap();
468/// let snapshot1 = ans.pos();
469///
470/// let symbols2 = vec![3, 1, 5];
471/// ans.encode_iid_symbols_reverse(&symbols2, &entropy_model).unwrap();
472/// let snapshot2 = ans.pos();
473///
474/// // As discussed above, `DefaultAnsCoder` doesn't impl `Seek` but we can get a decoder that does:
475/// let mut seekable_decoder = ans.as_seekable_decoder();
476///
477/// // `seekable_decoder` is still a `AnsCoder`, so decoding would start with the items we encoded
478/// // last. But since it implements `Seek` we can jump ahead to our first snapshot:
479/// seekable_decoder.seek(snapshot1);
480/// let decoded1 = seekable_decoder
481///     .decode_iid_symbols(4, &entropy_model)
482///     .collect::<Result<Vec<_>, _>>()
483///     .unwrap();
484/// assert_eq!(decoded1, symbols1);
485///
486/// // We've reached the end of the compressed data ...
487/// assert!(seekable_decoder.is_empty());
488///
489/// // ... but we can still jump to somewhere else and continue decoding from there:
490/// seekable_decoder.seek(snapshot2);
491///
492/// // Creating snapshots didn't mutate the coder, so we can just decode through `snapshot1`:
493/// let decoded_both = seekable_decoder.decode_iid_symbols(7, &entropy_model).map(Result::unwrap);
494/// assert!(decoded_both.eq(symbols2.into_iter().chain(symbols1)));
495/// assert!(seekable_decoder.is_empty()); // <-- We've reached the end again.
496/// ```
497///
498/// [`Encode`]: stream::Encode
499/// [`DefaultAnsCoder`]: stream::stack::DefaultAnsCoder
500/// [`as_seekable_decoder`]: stream::stack::AnsCoder::as_seekable_decoder
501/// [`into_seekable_decoder`]: stream::stack::AnsCoder::into_seekable_decoder
502pub trait Seek: PosSeek {
503    /// Jumps to a given position in the compressed data.
504    ///
505    /// The argument `pos` is the same pair of values returned by
506    /// [`Pos::pos`], i.e., it is a tuple of the position in the compressed
507    /// data and the `State` to which the entropy coder should be restored. Both values
508    /// are absolute (i.e., seeking happens independently of the current state or
509    /// position of the entropy coder). The position is measured in units of
510    /// `Word`s (see second example below where we manipulate a position
511    /// obtained from `Pos::pos` in order to reflect a manual reordering of
512    /// the `Word`s in the compressed data).
513    ///
514    /// # Examples
515    ///
516    /// The method takes the position and state as a tuple rather than as independent
517    /// method arguments so that one can simply pass in the tuple obtained from
518    /// [`Pos::pos`] as sketched below:
519    ///
520    /// ```
521    /// // Step 1: Obtain an encoder and encode some data (omitted for brevity) ...
522    /// # use constriction::{stream::stack::DefaultAnsCoder, Pos, Seek};
523    /// # let encoder = DefaultAnsCoder::new();
524    ///
525    /// // Step 2: Take a snapshot by calling `Pos::pos`:
526    /// let snapshot = encoder.pos(); // <-- Returns a tuple `(pos, state)`.
527    ///
528    /// // Step 3: Encode some more data and then obtain a decoder (omitted for brevity) ...
529    /// # let mut decoder = encoder.as_seekable_decoder();
530    ///
531    /// // Step 4: Jump to snapshot by calling `Seek::seek`:
532    /// decoder.seek(snapshot); // <-- No need to deconstruct `snapshot` into `(pos, state)`.
533    /// ```
534    ///
535    /// For more fine-grained control, one may want to assemble the tuple
536    /// `pos` manually. For example, a [`DefaultAnsCoder`] encodes data from
537    /// front to back and then decodes the data in the reverse direction from back to
538    /// front. Decoding from back to front may be inconvenient in some use cases, so one
539    /// might prefer to instead reverse the order of the `Word`s once encoding
540    /// is finished, and then decode them in the more natural direction from front to
541    /// back. Reversing the compressed data changes the position of each
542    /// `Word`, and so any positions obtained from `Pos` need to be adjusted
543    /// accordingly before they may be passed to `seek`, as in the following example:
544    ///
545    /// ```
546    /// use constriction::{
547    ///     stream::{model::LeakyQuantizer, stack::{DefaultAnsCoder, AnsCoder}, Decode},
548    ///     Pos, Seek
549    /// };
550    ///
551    /// // Construct a `DefaultAnsCoder` for encoding and an entropy model:
552    /// let mut encoder = DefaultAnsCoder::new();
553    /// let quantizer = LeakyQuantizer::<_, _, u32, 24>::new(-100..=100);
554    /// let entropy_model = quantizer.quantize(probability::distribution::Gaussian::new(0.0, 10.0));
555    ///
556    /// // Encode two chunks of symbols and take a snapshot in-between:
557    /// encoder.encode_iid_symbols_reverse(-100..40, &entropy_model).unwrap();
558    /// let (mut snapshot_pos, snapshot_state) = encoder.pos();
559    /// encoder.encode_iid_symbols_reverse(50..101, &entropy_model).unwrap();
560    ///
561    /// // Obtain compressed data, reverse it, and create a decoder that reads it from front to back:
562    /// let mut compressed = encoder.into_compressed().unwrap();
563    /// compressed.reverse();
564    /// snapshot_pos = compressed.len() - snapshot_pos; // <-- Adjusts the snapshot position.
565    /// let mut decoder = AnsCoder::from_reversed_compressed(compressed).unwrap();
566    ///
567    /// // Since we chose to encode onto a stack, decoding yields the last encoded chunk first:
568    /// assert_eq!(decoder.decode_symbol(entropy_model).unwrap(), 50);
569    /// assert_eq!(decoder.decode_symbol(entropy_model).unwrap(), 51);
570    ///
571    /// // To jump to our snapshot, we have to use the adjusted `snapshot_pos`:
572    /// decoder.seek((snapshot_pos, snapshot_state));
573    /// assert!(decoder.decode_iid_symbols(140, &entropy_model).map(Result::unwrap).eq(-100..40));
574    /// assert!(decoder.is_empty()); // <-- We've reached the end of the compressed data.
575    /// ```
576    ///
577    /// [`DefaultAnsCoder`]: stream::stack::DefaultAnsCoder
578    #[allow(clippy::result_unit_err)]
579    fn seek(&mut self, pos: Self::Position) -> Result<(), ()>;
580}
581
582/// A trait for bit strings of fixed (and usually small) length.
583///
584/// Short fixed-length bit strings are fundamental building blocks of efficient entropy
585/// coding algorithms. They are currently used for the following purposes:
586/// - to represent the smallest unit of compressed data (see [`stream::Code::Word`]);
587/// - to represent probabilities in fixed point arithmetic (see
588///   [`stream::model::EntropyModel::Probability`]); and
589/// - the internal state of entropy coders (see [`stream::Code::State`]) is typically comprised of
590///   one or more `BitArray`s, although this is not a requirement.
591///
592/// This trait is implemented on all primitive unsigned integer types. It is not recommended
593/// to implement this trait for custom types since coders will assume, for performance
594/// considerations, that `BitArray`s can be represented and manipulated efficiently in
595/// hardware.
596///
597/// # Safety
598///
599/// This trait is marked `unsafe` so that entropy coders may rely on the assumption that all
600/// `BitArray`s have precisely the same behavior as builtin unsigned integer types, and that
601/// [`BitArray::BITS`] has the correct value.
602pub unsafe trait BitArray:
603    PrimInt
604    + Unsigned
605    + WrappingAdd
606    + WrappingSub
607    + WrappingMul
608    + LowerHex
609    + UpperHex
610    + Binary
611    + Default
612    + Copy
613    + Display
614    + Debug
615    + Eq
616    + Hash
617    + 'static
618{
619    /// The (fixed) length of the `BitArray` in bits.
620    ///
621    /// Defaults to `8 * core::mem::size_of::<Self>()`, which is suitable for all
622    /// primitive unsigned integers.
623    ///
624    /// This could arguably be called `LEN` instead, but that may be confusing since
625    /// "lengths" are typically not measured in bits in the Rust ecosystem.
626    const BITS: usize = 8 * core::mem::size_of::<Self>();
627
628    type NonZero: NonZeroBitArray<Base = Self>;
629
630    #[inline(always)]
631    fn into_nonzero(self) -> Option<Self::NonZero> {
632        Self::NonZero::new(self)
633    }
634
635    /// # Safety
636    ///
637    /// The provided value must be nonzero.
638    #[inline(always)]
639    unsafe fn into_nonzero_unchecked(self) -> Self::NonZero {
640        Self::NonZero::new_unchecked(self)
641    }
642}
643
644/// A trait for bit strings like [`BitArray`] but with guaranteed nonzero values
645///
646/// # Safety
647///
648/// Must guarantee that the value is indeed nonzero. Failing to do so could, e.g., cause a
649/// division by zero in entropy coders, which is undefined behavior.
650pub unsafe trait NonZeroBitArray: Copy + Display + Debug + Eq + Hash + 'static {
651    type Base: BitArray<NonZero = Self>;
652
653    fn new(n: Self::Base) -> Option<Self>;
654
655    /// # Safety
656    ///
657    /// The provided value `n` must be nonzero.
658    unsafe fn new_unchecked(n: Self::Base) -> Self;
659
660    fn get(self) -> Self::Base;
661}
662
663macro_rules! unsafe_impl_bit_array {
664    ($(($base:ty, $non_zero:ty)),+ $(,)?) => {
665        $(
666            unsafe impl BitArray for $base {
667                type NonZero = $non_zero;
668            }
669
670            unsafe impl NonZeroBitArray for $non_zero {
671                type Base = $base;
672
673                #[inline(always)]
674                fn new(n: Self::Base) -> Option<Self> {
675                    Self::new(n)
676                }
677
678                #[inline(always)]
679                unsafe fn new_unchecked(n: Self::Base) -> Self {
680                    Self::new_unchecked(n)
681                }
682
683                #[inline(always)]
684                fn get(self) -> Self::Base {
685                    let non_zero = self.get();
686                    unsafe {
687                        // SAFETY: This is trivially safe because `non_zero` came from a
688                        // `NonZero` type. We really shouldn't have to give the compiler
689                        // this hint but it turns out the compiler would otherwise emit an
690                        // unnecessary check for div by zero. The unnecessary check used to
691                        // have a significant impact on performance, but it doesn't seem to
692                        // anymore as of rust version 1.58.0 (although the check itself is
693                        // still there).
694                        if non_zero == num_traits::zero::<Self::Base>() {
695                            core::hint::unreachable_unchecked();
696                        } else {
697                            non_zero
698                        }
699                    }
700                }
701            }
702        )+
703    };
704}
705
706unsafe_impl_bit_array!(
707    (u8, NonZeroU8),
708    (u16, NonZeroU16),
709    (u32, NonZeroU32),
710    (u64, NonZeroU64),
711    (usize, NonZeroUsize),
712);
713
714#[cfg(feature = "std")]
715unsafe_impl_bit_array!((u128, core::num::NonZeroU128),);
716
717/// Iterates from most significant to least significant bits in chunks but skips any
718/// initial zero chunks.
719fn bit_array_to_chunks_truncated<Data, Chunk>(
720    data: Data,
721) -> impl ExactSizeIterator<Item = Chunk> + DoubleEndedIterator
722where
723    Data: BitArray + AsPrimitive<Chunk>,
724    Chunk: BitArray,
725{
726    (0..(Data::BITS - data.leading_zeros() as usize))
727        .step_by(Chunk::BITS)
728        .rev()
729        .map(move |shift| (data >> shift).as_())
730}
731
732#[inline(always)]
733fn wrapping_pow2<T: BitArray>(exponent: usize) -> T {
734    if exponent >= T::BITS {
735        T::zero()
736    } else {
737        T::one() << exponent
738    }
739}
740
741pub trait UnwrapInfallible<T> {
742    fn unwrap_infallible(self) -> T;
743}
744
745impl<T> UnwrapInfallible<T> for Result<T, Infallible> {
746    #[inline(always)]
747    fn unwrap_infallible(self) -> T {
748        #[allow(unreachable_patterns)]
749        match self {
750            Ok(x) => x,
751            Err(infallible) => match infallible {},
752        }
753    }
754}
755
756impl<T> UnwrapInfallible<T> for Result<T, CoderError<Infallible, Infallible>> {
757    fn unwrap_infallible(self) -> T {
758        #[allow(unreachable_patterns)]
759        match self {
760            Ok(x) => x,
761            #[allow(unreachable_patterns)]
762            Err(infallible) => match infallible {
763                CoderError::Backend(infallible) => match infallible {},
764                CoderError::Frontend(infallible) => match infallible {},
765            },
766        }
767    }
768}
769
770#[derive(Debug, PartialEq, Eq, Clone, Copy)]
771pub struct NanError;
772
773impl Display for NanError {
774    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
775        write!(f, "NaN encountered.")
776    }
777}
778
779#[cfg(feature = "std")]
780impl std::error::Error for NanError {}
781
782/// Helper macro to express assertions that are tested at compile time
783/// despite using properties of generic parameters of an outer function.
784///
785/// See discussion at <https://morestina.net/blog/1940>.
786macro_rules! generic_static_asserts {
787    (($($l:lifetime,)* $($($t:ident$(: $bound:path)?),+)? $(; $(const $c:ident:$ct:ty),+)?); $($label:ident: $test:expr);+$(;)?) => {
788        #[allow(path_statements, clippy::no_effect)]
789        {
790            {
791                struct Check<$($l,)* $($($t,)+)? $($(const $c:$ct,)+)?>($($($t,)+)?);
792                impl<$($l,)* $($($t$(:$bound)?,)+)? $($(const $c:$ct,)+)?> Check<$($l,)* $($($t,)+)? $($($c,)+)?> {
793                    $(
794                        const $label: () = assert!($test);
795                    )+
796                }
797                generic_static_asserts!{@nested Check::<$($l,)* $($($t,)+)? $($($c,)+)?>, $($label: $test;)+}
798            }
799        }
800    };
801    (@nested $t:ty, $($label:ident: $test:expr;)+) => {
802        $(
803            <$t>::$label;
804        )+
805    }
806}
807
808pub(crate) use generic_static_asserts;
constriction/lib.rs

constriction/
lib.rs