constriction/lib.rs
1//! Entropy Coding Primitives for Research and Production
2//!
3//! The `constriction` crate provides a set of composable entropy coding algorithms with a
4//! focus on correctness, versatility, ease of use, compression performance, and
5//! computational efficiency. The goals of `constriction` are to three-fold:
6//!
7//! 1. **to facilitate research on novel lossless and lossy compression methods** by
8//! providing a *composable* set of entropy coding primitives rather than a rigid
9//! implementation of a single preconfigured method;
10//! 2. **to simplify the transition from research code to production software** by exposing
11//! the exact same functionality via both a Python API (for rapid prototyping on research
12//! code) and a Rust API (for turning successful prototypes into production); and
13//! 3. **to serve as a teaching resource** by providing a wide range of entropy coding
14//! algorithms within a single consistent framework, thus making the various algorithms
15//! easily discoverable and comparable on example models and data. [Additional teaching
16//! material](https://robamler.github.io/teaching/compress21/) is being made publicly
17//! available as a by-product of an ongoing university course on data compression with
18//! deep probabilistic models.
19//!
20//! For an example of a compression codec that started as research code in Python and was
21//! then deployed as a fast and dependency-free WebAssembly module using `constriction`'s
22//! Rust API, have a look at [The Linguistic Flux
23//! Capacitor](https://robamler.github.io/linguistic-flux-capacitor).
24//!
25//! # Project Status
26//!
27//! We currently provide implementations of the following entropy coding algorithms:
28//!
29//! - **Asymmetric Numeral Systems (ANS):** a fast modern entropy coder with near-optimal
30//! compression effectiveness that supports advanced use cases like bits-back coding.
31//! - **Range Coding:** a computationally efficient variant of Arithmetic Coding, that has
32//! essentially the same compression effectiveness as ANS Coding but operates as a queue
33//! ("first in first out"), which makes it preferable for autoregressive models.
34//! - **Chain Coding:** an experimental new entropy coder that combines the (net)
35//! effectiveness of stream codes with the locality of symbol codes; it is meant for
36//! experimental new compression approaches that perform joint inference, quantization,
37//! and bits-back coding in an end-to-end optimization. This experimental coder is mainly
38//! provided to prove to ourselves that the API for encoding and decoding, which is shared
39//! across all stream coders, is flexible enough to express complex novel tasks.
40//! - **Huffman Coding:** a well-known symbol code, mainly provided here for teaching
41//! purpose; you'll usually want to use a stream code like ANS or Range Coding instead
42//! since symbol codes can have a considerable overhead on the bitrate, especially in the
43//! regime of low entropy per symbol, which is common in machine-learning based
44//! compression methods.
45//!
46//! Further, `constriction` provides implementations of common probability distributions in
47//! fixed-point arithmetic, which can be used as entropy models in either of the above
48//! stream codes. The crate also provides adapters for turning custom probability
49//! distributions into exactly invertible fixed-point arithmetic.
50//!
51//! The provided implementations of entropy coding algorithms and probability distributions
52//! are extensively tested and should be considered reliable (except for the still
53//! experimental Chain Coder). However, their APIs may change in future versions of
54//! `constriction` if more user experience reveals any shortcomings of the current APIs in
55//! terms of ergonomics. Please [file an
56//! issue](https://github.com/bamler-lab/constriction/issues) if you run into a scenario
57//! where the current APIs are suboptimal.
58//!
59//! # Quick Start With the Rust API
60//!
61//! You are currently reading the documentation of `constriction`'s Rust API. If Rust is not
62//! your language of choice then head over to the [Python API
63//! Documentation](https://bamler-lab.github.io/constriction/apidoc/python/). The Rust API
64//! provides efficient and composable entropy coding primitives that can be adjusted to a
65//! fine degree of detail using type parameters and const generics (type aliases with sane
66//! defaults for all generic parameters are provided as a guidance). The Python API exposes
67//! the most common use cases of these entropy coding primitives to an environment that
68//! feels more natural to many data scientists.
69//!
70//! ## Setup
71//!
72//! To use `constriction` in your Rust project, just add the following line to the
73//! `[dependencies]` section of your `Cargo.toml`:
74//!
75//! ```toml
76//! [dependencies]
77//! constriction = "0.4.1"
78//! ```
79//!
80//! ## System Requirements
81//!
82//! `constriction` requires Rust version 1.51 or later for its use of the
83//! `min_const_generics` feature. If you have an older version of Rust, update to the latest
84//! version by running `rustup update stable`.
85//!
86//! ## Encoding Example
87//!
88//! In this example, we'll encode some symbols using a quantized Gaussian distribution as
89//! entropy model. Each symbol will be modeled by a quantized Gaussian with a different
90//! mean and standard deviation (so that the example is not too simplistic). We'll use the
91//! `probability` crate for the Gaussian distributions, so also add the following dependency
92//! to your `Cargo.toml`:
93//!
94//! ```toml
95//! probability = "0.20"
96//! ```
97//!
98//! Now, let's encode (i.e., compress) some symbols. We'll use an Asymmetric Numeral Systems
99//! (ANS) Coder here for its speed and compression performance. We'll discuss how you could
100//! replace the ANS Coder with a Range Coder or a symbol code like Huffman Coding
101//! [below](#exercise).
102//!
103//! ```
104//! use constriction::stream::{stack::DefaultAnsCoder, model::DefaultLeakyQuantizer};
105//! use probability::distribution::Gaussian;
106//!
107//! fn encode_sample_data() -> Vec<u32> {
108//! // Create an empty ANS Coder with default word and state size:
109//! let mut coder = DefaultAnsCoder::new();
110//!
111//! // Some made up data and entropy models for demonstration purpose:
112//! let symbols = [23i32, -15, 78, 43, -69];
113//! let means = [35.2, -1.7, 30.1, 71.2, -75.1];
114//! let stds = [10.1, 25.3, 23.8, 35.4, 3.9];
115//!
116//! // Create an adapter that integrates 1-d probability density functions over bins
117//! // `[n - 0.5, n + 0.5)` for all integers `n` from `-100` to `100` using fixed point
118//! // arithmetic with default precision, guaranteeing a nonzero probability for each bin:
119//! let quantizer = DefaultLeakyQuantizer::new(-100..=100);
120//!
121//! // Encode the data (in reverse order, since ANS Coding operates as a stack):
122//! coder.encode_symbols_reverse(
123//! symbols.iter().zip(&means).zip(&stds).map(
124//! |((&sym, &mean), &std)| (sym, quantizer.quantize(Gaussian::new(mean, std)))
125//! )).unwrap();
126//!
127//! // Retrieve the compressed representation (filling it up to full words with zero bits).
128//! coder.into_compressed().unwrap()
129//! }
130//!
131//! assert_eq!(encode_sample_data(), [0x421C_7EC3, 0x000B_8ED1]);
132//! ```
133//!
134//! ## Decoding Example
135//!
136//! Now, let's reconstruct the sample data from its compressed representation.
137//!
138//! ```
139//! use constriction::stream::{stack::DefaultAnsCoder, model::DefaultLeakyQuantizer, Decode};
140//! use probability::distribution::Gaussian;
141//!
142//! fn decode_sample_data(compressed: Vec<u32>) -> Vec<i32> {
143//! // Create an ANS Coder with default word and state size from the compressed data:
144//! // (ANS uses the same type for encoding and decoding, which makes the method very flexible
145//! // and allows interleaving small encoding and decoding chunks, e.g., for bits-back coding.)
146//! let mut coder = DefaultAnsCoder::from_compressed(compressed).unwrap();
147//!
148//! // Same entropy models and quantizer we used for encoding:
149//! let means = [35.2, -1.7, 30.1, 71.2, -75.1];
150//! let stds = [10.1, 25.3, 23.8, 35.4, 3.9];
151//! let quantizer = DefaultLeakyQuantizer::new(-100..=100);
152//!
153//! // Decode the data:
154//! coder.decode_symbols(
155//! means.iter().zip(&stds).map(
156//! |(&mean, &std)| quantizer.quantize(Gaussian::new(mean, std))
157//! )).collect::<Result<Vec<_>, _>>().unwrap()
158//! }
159//!
160//! assert_eq!(decode_sample_data(vec![0x421C_7EC3, 0x000B_8ED1]), [23, -15, 78, 43, -69]);
161//! ```
162//!
163//! ## Exercise
164//!
165//! Try out the above examples and verify that decoding reconstructs the original data. Then
166//! see how easy `constriction` makes it to replace the ANS Coder with a Range Coder by
167//! making the following substitutions:
168//!
169//! **In the encoder,**
170//!
171//! - replace `constriction::stream::stack::DefaultAnsCoder` with
172//! `constriction::stream::queue::DefaultRangeEncoder`; and
173//! - replace `coder.encode_symbols_reverse` with `coder.encode_symbols` (you no longer need
174//! to encode symbols in reverse order since Range Coding operates as a queue, i.e.,
175//! first-in-first-out). You'll also have to add the line
176//! `use constriction::stream::Encode;` to the top of the file to bring the trait method
177//! `encode_symbols` into scope.
178//!
179//! **In the decoder,**
180//!
181//! - replace `constriction::stream::stack::DefaultAnsCoder` with
182//! `constriction::stream::queue::DefaultRangeDecoder` (note that Range Coding
183//! distinguishes between an encoder and a decoder type since the encoder writes to the
184//! back while the decoder reads from the front; by contrast, ANS Coding is a stack, i.e.,
185//! it reads and writes at the same position and allows interleaving reads and writes).
186//!
187//! *Remark:* You could also use a symbol code like Huffman Coding (see module [`symbol`])
188//! but that would have considerably worse compression performance, especially on large
189//! files, since symbol codes always emit an integer number of bits per compressed symbol,
190//! even if the information content of the symbol is a fractional number (stream codes like
191//! ANS and Range Coding *effectively* emit a fractional number of bits per symbol since
192//! they amortize over several symbols).
193//!
194//! The above replacements should lead you to something like this:
195//!
196//! ```
197//! use constriction::stream::{
198//! model::DefaultLeakyQuantizer,
199//! queue::{DefaultRangeEncoder, DefaultRangeDecoder},
200//! Encode, Decode,
201//! };
202//! use probability::distribution::Gaussian;
203//!
204//! fn encode_sample_data() -> Vec<u32> {
205//! // Create an empty Range Encoder with default word and state size:
206//! let mut encoder = DefaultRangeEncoder::new();
207//!
208//! // Same made up data, entropy models, and quantizer as in the ANS Coding example above:
209//! let symbols = [23i32, -15, 78, 43, -69];
210//! let means = [35.2, -1.7, 30.1, 71.2, -75.1];
211//! let stds = [10.1, 25.3, 23.8, 35.4, 3.9];
212//! let quantizer = DefaultLeakyQuantizer::new(-100..=100);
213//!
214//! // Encode the data (this time in normal order, since Range Coding is a queue):
215//! encoder.encode_symbols(
216//! symbols.iter().zip(&means).zip(&stds).map(
217//! |((&sym, &mean), &std)| (sym, quantizer.quantize(Gaussian::new(mean, std)))
218//! )).unwrap();
219//!
220//! // Retrieve the (sealed up) compressed representation.
221//! encoder.into_compressed().unwrap()
222//! }
223//!
224//! fn decode_sample_data(compressed: Vec<u32>) -> Vec<i32> {
225//! // Create a Range Decoder with default word and state size from the compressed data:
226//! let mut decoder = DefaultRangeDecoder::from_compressed(compressed).unwrap();
227//!
228//! // Same entropy models and quantizer we used for encoding:
229//! let means = [35.2, -1.7, 30.1, 71.2, -75.1];
230//! let stds = [10.1, 25.3, 23.8, 35.4, 3.9];
231//! let quantizer = DefaultLeakyQuantizer::new(-100..=100);
232//!
233//! // Decode the data:
234//! decoder.decode_symbols(
235//! means.iter().zip(&stds).map(
236//! |(&mean, &std)| quantizer.quantize(Gaussian::new(mean, std))
237//! )).collect::<Result<Vec<_>, _>>().unwrap()
238//! }
239//!
240//! let compressed = encode_sample_data();
241//!
242//! // We'll get a different compressed representation than in the ANS Coding
243//! // example because we're using a different entropy coding algorithm ...
244//! assert_eq!(compressed, [0x1C31EFEB, 0x87B430DA]);
245//!
246//! // ... but as long as we decode with the matching algorithm we can still reconstruct the data:
247//! assert_eq!(decode_sample_data(compressed), [23, -15, 78, 43, -69]);
248//! ```
249//!
250//! # Where to Go Next?
251//!
252//! If you already have an entropy model and you just want to encode and decode some
253//! sequence of symbols then you can probably start by adjusting the above
254//! [examples](#encoding-example) to your needs. Or have a closer look at the [`stream`]
255//! module.
256//!
257//! If you're still new to the concept of entropy coding then check out the [teaching
258//! material](https://robamler.github.io/teaching/compress21/).
259
260#![no_std]
261#![warn(rust_2018_idioms, missing_debug_implementations)]
262
263extern crate alloc;
264
265#[cfg(feature = "std")]
266extern crate std;
267
268#[cfg(feature = "pybindings")]
269mod pybindings;
270
271pub mod backends;
272pub mod stream;
273pub mod symbol;
274
275use core::{
276 convert::Infallible,
277 fmt::{Binary, Debug, Display, LowerHex, UpperHex},
278 hash::Hash,
279 num::{NonZeroU16, NonZeroU32, NonZeroU64, NonZeroU8, NonZeroUsize},
280};
281
282use num_traits::{AsPrimitive, PrimInt, Unsigned, WrappingAdd, WrappingMul, WrappingSub};
283
284// READ WRITE SEMANTICS =======================================================
285
286/// A trait for marking how reading and writing order relate to each other.
287///
288/// This is currently only used in the [`backends`] module. Future versions of
289/// `constriction` may expand its use to frontends.
290pub trait Semantics: Default {}
291
292/// Zero sized marker struct for last-in-first-out read/write [`Semantics`]
293///
294/// This type typically only comes up in advanced use cases that are generic over read/write
295/// semantics. If you are looking for an entropy coder that operates as a stack, check out
296/// the module [`stream::stack`].
297#[derive(Debug, Default)]
298pub struct Stack {}
299impl Semantics for Stack {}
300
301/// Zero sized marker struct for first-in-first-out read/write [`Semantics`]
302///
303/// This type typically only comes up in advanced use cases that are generic over read/write
304/// semantics. If you are looking for an entropy coder that operates as a queue, check out
305/// the module [`stream::queue`].
306#[derive(Debug, Default)]
307pub struct Queue {}
308impl Semantics for Queue {}
309
310// GENERIC ERROR TYPES ========================================================
311
312#[derive(Debug, Clone, PartialEq, Eq)]
313pub enum CoderError<FrontendError, BackendError> {
314 Frontend(FrontendError),
315 Backend(BackendError),
316}
317
318impl<FrontendError, BackendError> CoderError<FrontendError, BackendError> {
319 pub fn map_frontend<E>(self, f: impl Fn(FrontendError) -> E) -> CoderError<E, BackendError> {
320 match self {
321 Self::Frontend(err) => CoderError::Frontend(f(err)),
322 Self::Backend(err) => CoderError::Backend(err),
323 }
324 }
325
326 pub fn map_backend<E>(self, f: impl Fn(BackendError) -> E) -> CoderError<FrontendError, E> {
327 match self {
328 Self::Backend(err) => CoderError::Backend(f(err)),
329 Self::Frontend(err) => CoderError::Frontend(err),
330 }
331 }
332}
333
334impl<BackendError: Display, FrontendError: Display> Display
335 for CoderError<FrontendError, BackendError>
336{
337 fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
338 match self {
339 Self::Frontend(err) => write!(f, "Invalid compressed data: {err}"),
340 Self::Backend(err) => write!(f, "Error while reading compressed data: {err}"),
341 }
342 }
343}
344
345#[cfg(feature = "std")]
346impl<FrontendError: std::error::Error + 'static, BackendError: std::error::Error + 'static>
347 std::error::Error for CoderError<FrontendError, BackendError>
348{
349 fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
350 match self {
351 Self::Frontend(source) => Some(source),
352 Self::Backend(source) => Some(source),
353 }
354 }
355}
356
357impl<FrontendError, BackendError> From<BackendError> for CoderError<FrontendError, BackendError> {
358 fn from(read_error: BackendError) -> Self {
359 Self::Backend(read_error)
360 }
361}
362
363impl<FrontendError> CoderError<FrontendError, Infallible> {
364 fn into_frontend_error(self) -> FrontendError {
365 #[allow(unreachable_patterns)]
366 match self {
367 CoderError::Frontend(frontend_error) => frontend_error,
368 CoderError::Backend(infallible) => match infallible {},
369 }
370 }
371}
372
373type DefaultEncoderError<BackendError> = CoderError<DefaultEncoderFrontendError, BackendError>;
374
375#[derive(Debug, Clone, PartialEq, Eq)]
376pub enum DefaultEncoderFrontendError {
377 /// Tried to encode a symbol with zero probability under the used entropy model.
378 ///
379 /// This error can usually be avoided by using a "leaky" distribution, as the
380 /// entropy model, i.e., a distribution that assigns a nonzero probability to all
381 /// symbols within a finite domain. Leaky distributions can be constructed with,
382 /// e.g., a [`LeakyQuantizer`](models/struct.LeakyQuantizer.html) or with
383 /// [`LeakyCategorical::from_floating_point_probabilities`](
384 /// models/struct.LeakyCategorical.html#method.from_floating_point_probabilities).
385 ImpossibleSymbol,
386}
387
388impl Display for DefaultEncoderFrontendError {
389 fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
390 match self {
391 Self::ImpossibleSymbol => write!(
392 f,
393 "Tried to encode symbol that has zero probability under the used entropy model."
394 ),
395 }
396 }
397}
398
399#[cfg(feature = "std")]
400impl std::error::Error for DefaultEncoderFrontendError {}
401
402impl DefaultEncoderFrontendError {
403 #[inline(always)]
404 const fn into_coder_error<BackendError>(self) -> DefaultEncoderError<BackendError> {
405 DefaultEncoderError::Frontend(self)
406 }
407}
408
409/// Trait for coders or backends that *might* implement [`Pos`] and/or [`Seek`]
410///
411/// If a type implements `PosSeek` then that doesn't necessarily mean that it also
412/// implements [`Pos`] or [`Seek`]. Implementing `PosSeek` only fixes the common `Position`
413/// type that is used *if* the type implements `Pos` and/or `Seek`.
414pub trait PosSeek {
415 type Position: Clone;
416}
417
418/// A trait for entropy coders that keep track of their current position within the
419/// compressed data.
420///
421/// This is the counterpart of [`Seek`]. Call [`Pos::pos`] to record
422/// "snapshots" of an entropy coder, and then call [`Seek::seek`] at a later time
423/// to jump back to these snapshots. See examples in the documentations of [`Seek`]
424/// and [`Seek::seek`].
425pub trait Pos: PosSeek {
426 /// Returns the position in the compressed data, in units of `Word`s.
427 ///
428 /// It is up to the entropy coder to define what constitutes the beginning and end
429 /// positions within the compressed data (for example, a [`AnsCoder`] begins encoding
430 /// at position zero but it begins decoding at position `ans.buf().len()`).
431 ///
432 /// [`AnsCoder`]: stream::stack::AnsCoder
433 fn pos(&self) -> Self::Position;
434}
435
436/// A trait for entropy coders that support random access.
437///
438/// This is the counterpart of [`Pos`]. While [`Pos::pos`] can be used to
439/// record "snapshots" of an entropy coder, [`Seek::seek`] can be used to jump to these
440/// recorded snapshots.
441///
442/// Not all entropy coders that implement `Pos` also implement `Seek`. For example,
443/// [`DefaultAnsCoder`] implements `Pos` but it doesn't implement `Seek` because it
444/// supports both encoding and decoding and therefore always operates at the head. In
445/// such a case one can usually obtain a seekable entropy coder in return for
446/// surrendering some other property. For example, `DefaultAnsCoder` provides the methods
447/// [`as_seekable_decoder`] and [`into_seekable_decoder`] that return a decoder which
448/// implements `Seek` but which can no longer be used for encoding (i.e., it doesn't
449/// implement [`Encode`]).
450///
451/// # Example
452///
453/// ```
454/// use constriction::stream::{
455/// model::DefaultContiguousCategoricalEntropyModel, stack::DefaultAnsCoder, Decode
456/// };
457/// use constriction::{Pos, Seek};
458///
459/// // Create a `AnsCoder` encoder and an entropy model:
460/// let mut ans = DefaultAnsCoder::new();
461/// let probabilities = vec![0.03, 0.07, 0.1, 0.1, 0.2, 0.2, 0.1, 0.15, 0.05];
462/// let entropy_model = DefaultContiguousCategoricalEntropyModel
463/// ::from_floating_point_probabilities_fast(&probabilities, None).unwrap();
464///
465/// // Encode some symbols in two chunks and take a snapshot after each chunk.
466/// let symbols1 = vec![8, 2, 0, 7];
467/// ans.encode_iid_symbols_reverse(&symbols1, &entropy_model).unwrap();
468/// let snapshot1 = ans.pos();
469///
470/// let symbols2 = vec![3, 1, 5];
471/// ans.encode_iid_symbols_reverse(&symbols2, &entropy_model).unwrap();
472/// let snapshot2 = ans.pos();
473///
474/// // As discussed above, `DefaultAnsCoder` doesn't impl `Seek` but we can get a decoder that does:
475/// let mut seekable_decoder = ans.as_seekable_decoder();
476///
477/// // `seekable_decoder` is still a `AnsCoder`, so decoding would start with the items we encoded
478/// // last. But since it implements `Seek` we can jump ahead to our first snapshot:
479/// seekable_decoder.seek(snapshot1);
480/// let decoded1 = seekable_decoder
481/// .decode_iid_symbols(4, &entropy_model)
482/// .collect::<Result<Vec<_>, _>>()
483/// .unwrap();
484/// assert_eq!(decoded1, symbols1);
485///
486/// // We've reached the end of the compressed data ...
487/// assert!(seekable_decoder.is_empty());
488///
489/// // ... but we can still jump to somewhere else and continue decoding from there:
490/// seekable_decoder.seek(snapshot2);
491///
492/// // Creating snapshots didn't mutate the coder, so we can just decode through `snapshot1`:
493/// let decoded_both = seekable_decoder.decode_iid_symbols(7, &entropy_model).map(Result::unwrap);
494/// assert!(decoded_both.eq(symbols2.into_iter().chain(symbols1)));
495/// assert!(seekable_decoder.is_empty()); // <-- We've reached the end again.
496/// ```
497///
498/// [`Encode`]: stream::Encode
499/// [`DefaultAnsCoder`]: stream::stack::DefaultAnsCoder
500/// [`as_seekable_decoder`]: stream::stack::AnsCoder::as_seekable_decoder
501/// [`into_seekable_decoder`]: stream::stack::AnsCoder::into_seekable_decoder
502pub trait Seek: PosSeek {
503 /// Jumps to a given position in the compressed data.
504 ///
505 /// The argument `pos` is the same pair of values returned by
506 /// [`Pos::pos`], i.e., it is a tuple of the position in the compressed
507 /// data and the `State` to which the entropy coder should be restored. Both values
508 /// are absolute (i.e., seeking happens independently of the current state or
509 /// position of the entropy coder). The position is measured in units of
510 /// `Word`s (see second example below where we manipulate a position
511 /// obtained from `Pos::pos` in order to reflect a manual reordering of
512 /// the `Word`s in the compressed data).
513 ///
514 /// # Examples
515 ///
516 /// The method takes the position and state as a tuple rather than as independent
517 /// method arguments so that one can simply pass in the tuple obtained from
518 /// [`Pos::pos`] as sketched below:
519 ///
520 /// ```
521 /// // Step 1: Obtain an encoder and encode some data (omitted for brevity) ...
522 /// # use constriction::{stream::stack::DefaultAnsCoder, Pos, Seek};
523 /// # let encoder = DefaultAnsCoder::new();
524 ///
525 /// // Step 2: Take a snapshot by calling `Pos::pos`:
526 /// let snapshot = encoder.pos(); // <-- Returns a tuple `(pos, state)`.
527 ///
528 /// // Step 3: Encode some more data and then obtain a decoder (omitted for brevity) ...
529 /// # let mut decoder = encoder.as_seekable_decoder();
530 ///
531 /// // Step 4: Jump to snapshot by calling `Seek::seek`:
532 /// decoder.seek(snapshot); // <-- No need to deconstruct `snapshot` into `(pos, state)`.
533 /// ```
534 ///
535 /// For more fine-grained control, one may want to assemble the tuple
536 /// `pos` manually. For example, a [`DefaultAnsCoder`] encodes data from
537 /// front to back and then decodes the data in the reverse direction from back to
538 /// front. Decoding from back to front may be inconvenient in some use cases, so one
539 /// might prefer to instead reverse the order of the `Word`s once encoding
540 /// is finished, and then decode them in the more natural direction from front to
541 /// back. Reversing the compressed data changes the position of each
542 /// `Word`, and so any positions obtained from `Pos` need to be adjusted
543 /// accordingly before they may be passed to `seek`, as in the following example:
544 ///
545 /// ```
546 /// use constriction::{
547 /// stream::{model::LeakyQuantizer, stack::{DefaultAnsCoder, AnsCoder}, Decode},
548 /// Pos, Seek
549 /// };
550 ///
551 /// // Construct a `DefaultAnsCoder` for encoding and an entropy model:
552 /// let mut encoder = DefaultAnsCoder::new();
553 /// let quantizer = LeakyQuantizer::<_, _, u32, 24>::new(-100..=100);
554 /// let entropy_model = quantizer.quantize(probability::distribution::Gaussian::new(0.0, 10.0));
555 ///
556 /// // Encode two chunks of symbols and take a snapshot in-between:
557 /// encoder.encode_iid_symbols_reverse(-100..40, &entropy_model).unwrap();
558 /// let (mut snapshot_pos, snapshot_state) = encoder.pos();
559 /// encoder.encode_iid_symbols_reverse(50..101, &entropy_model).unwrap();
560 ///
561 /// // Obtain compressed data, reverse it, and create a decoder that reads it from front to back:
562 /// let mut compressed = encoder.into_compressed().unwrap();
563 /// compressed.reverse();
564 /// snapshot_pos = compressed.len() - snapshot_pos; // <-- Adjusts the snapshot position.
565 /// let mut decoder = AnsCoder::from_reversed_compressed(compressed).unwrap();
566 ///
567 /// // Since we chose to encode onto a stack, decoding yields the last encoded chunk first:
568 /// assert_eq!(decoder.decode_symbol(entropy_model).unwrap(), 50);
569 /// assert_eq!(decoder.decode_symbol(entropy_model).unwrap(), 51);
570 ///
571 /// // To jump to our snapshot, we have to use the adjusted `snapshot_pos`:
572 /// decoder.seek((snapshot_pos, snapshot_state));
573 /// assert!(decoder.decode_iid_symbols(140, &entropy_model).map(Result::unwrap).eq(-100..40));
574 /// assert!(decoder.is_empty()); // <-- We've reached the end of the compressed data.
575 /// ```
576 ///
577 /// [`DefaultAnsCoder`]: stream::stack::DefaultAnsCoder
578 #[allow(clippy::result_unit_err)]
579 fn seek(&mut self, pos: Self::Position) -> Result<(), ()>;
580}
581
582/// A trait for bit strings of fixed (and usually small) length.
583///
584/// Short fixed-length bit strings are fundamental building blocks of efficient entropy
585/// coding algorithms. They are currently used for the following purposes:
586/// - to represent the smallest unit of compressed data (see [`stream::Code::Word`]);
587/// - to represent probabilities in fixed point arithmetic (see
588/// [`stream::model::EntropyModel::Probability`]); and
589/// - the internal state of entropy coders (see [`stream::Code::State`]) is typically comprised of
590/// one or more `BitArray`s, although this is not a requirement.
591///
592/// This trait is implemented on all primitive unsigned integer types. It is not recommended
593/// to implement this trait for custom types since coders will assume, for performance
594/// considerations, that `BitArray`s can be represented and manipulated efficiently in
595/// hardware.
596///
597/// # Safety
598///
599/// This trait is marked `unsafe` so that entropy coders may rely on the assumption that all
600/// `BitArray`s have precisely the same behavior as builtin unsigned integer types, and that
601/// [`BitArray::BITS`] has the correct value.
602pub unsafe trait BitArray:
603 PrimInt
604 + Unsigned
605 + WrappingAdd
606 + WrappingSub
607 + WrappingMul
608 + LowerHex
609 + UpperHex
610 + Binary
611 + Default
612 + Copy
613 + Display
614 + Debug
615 + Eq
616 + Hash
617 + 'static
618{
619 /// The (fixed) length of the `BitArray` in bits.
620 ///
621 /// Defaults to `8 * core::mem::size_of::<Self>()`, which is suitable for all
622 /// primitive unsigned integers.
623 ///
624 /// This could arguably be called `LEN` instead, but that may be confusing since
625 /// "lengths" are typically not measured in bits in the Rust ecosystem.
626 const BITS: usize = 8 * core::mem::size_of::<Self>();
627
628 type NonZero: NonZeroBitArray<Base = Self>;
629
630 #[inline(always)]
631 fn into_nonzero(self) -> Option<Self::NonZero> {
632 Self::NonZero::new(self)
633 }
634
635 /// # Safety
636 ///
637 /// The provided value must be nonzero.
638 #[inline(always)]
639 unsafe fn into_nonzero_unchecked(self) -> Self::NonZero {
640 Self::NonZero::new_unchecked(self)
641 }
642}
643
644/// A trait for bit strings like [`BitArray`] but with guaranteed nonzero values
645///
646/// # Safety
647///
648/// Must guarantee that the value is indeed nonzero. Failing to do so could, e.g., cause a
649/// division by zero in entropy coders, which is undefined behavior.
650pub unsafe trait NonZeroBitArray: Copy + Display + Debug + Eq + Hash + 'static {
651 type Base: BitArray<NonZero = Self>;
652
653 fn new(n: Self::Base) -> Option<Self>;
654
655 /// # Safety
656 ///
657 /// The provided value `n` must be nonzero.
658 unsafe fn new_unchecked(n: Self::Base) -> Self;
659
660 fn get(self) -> Self::Base;
661}
662
663macro_rules! unsafe_impl_bit_array {
664 ($(($base:ty, $non_zero:ty)),+ $(,)?) => {
665 $(
666 unsafe impl BitArray for $base {
667 type NonZero = $non_zero;
668 }
669
670 unsafe impl NonZeroBitArray for $non_zero {
671 type Base = $base;
672
673 #[inline(always)]
674 fn new(n: Self::Base) -> Option<Self> {
675 Self::new(n)
676 }
677
678 #[inline(always)]
679 unsafe fn new_unchecked(n: Self::Base) -> Self {
680 Self::new_unchecked(n)
681 }
682
683 #[inline(always)]
684 fn get(self) -> Self::Base {
685 let non_zero = self.get();
686 unsafe {
687 // SAFETY: This is trivially safe because `non_zero` came from a
688 // `NonZero` type. We really shouldn't have to give the compiler
689 // this hint but it turns out the compiler would otherwise emit an
690 // unnecessary check for div by zero. The unnecessary check used to
691 // have a significant impact on performance, but it doesn't seem to
692 // anymore as of rust version 1.58.0 (although the check itself is
693 // still there).
694 if non_zero == num_traits::zero::<Self::Base>() {
695 core::hint::unreachable_unchecked();
696 } else {
697 non_zero
698 }
699 }
700 }
701 }
702 )+
703 };
704}
705
706unsafe_impl_bit_array!(
707 (u8, NonZeroU8),
708 (u16, NonZeroU16),
709 (u32, NonZeroU32),
710 (u64, NonZeroU64),
711 (usize, NonZeroUsize),
712);
713
714#[cfg(feature = "std")]
715unsafe_impl_bit_array!((u128, core::num::NonZeroU128),);
716
717/// Iterates from most significant to least significant bits in chunks but skips any
718/// initial zero chunks.
719fn bit_array_to_chunks_truncated<Data, Chunk>(
720 data: Data,
721) -> impl ExactSizeIterator<Item = Chunk> + DoubleEndedIterator
722where
723 Data: BitArray + AsPrimitive<Chunk>,
724 Chunk: BitArray,
725{
726 (0..(Data::BITS - data.leading_zeros() as usize))
727 .step_by(Chunk::BITS)
728 .rev()
729 .map(move |shift| (data >> shift).as_())
730}
731
732#[inline(always)]
733fn wrapping_pow2<T: BitArray>(exponent: usize) -> T {
734 if exponent >= T::BITS {
735 T::zero()
736 } else {
737 T::one() << exponent
738 }
739}
740
741pub trait UnwrapInfallible<T> {
742 fn unwrap_infallible(self) -> T;
743}
744
745impl<T> UnwrapInfallible<T> for Result<T, Infallible> {
746 #[inline(always)]
747 fn unwrap_infallible(self) -> T {
748 #[allow(unreachable_patterns)]
749 match self {
750 Ok(x) => x,
751 Err(infallible) => match infallible {},
752 }
753 }
754}
755
756impl<T> UnwrapInfallible<T> for Result<T, CoderError<Infallible, Infallible>> {
757 fn unwrap_infallible(self) -> T {
758 #[allow(unreachable_patterns)]
759 match self {
760 Ok(x) => x,
761 #[allow(unreachable_patterns)]
762 Err(infallible) => match infallible {
763 CoderError::Backend(infallible) => match infallible {},
764 CoderError::Frontend(infallible) => match infallible {},
765 },
766 }
767 }
768}
769
770#[derive(Debug, PartialEq, Eq, Clone, Copy)]
771pub struct NanError;
772
773impl Display for NanError {
774 fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
775 write!(f, "NaN encountered.")
776 }
777}
778
779#[cfg(feature = "std")]
780impl std::error::Error for NanError {}
781
782/// Helper macro to express assertions that are tested at compile time
783/// despite using properties of generic parameters of an outer function.
784///
785/// See discussion at <https://morestina.net/blog/1940>.
786macro_rules! generic_static_asserts {
787 (($($l:lifetime,)* $($($t:ident$(: $bound:path)?),+)? $(; $(const $c:ident:$ct:ty),+)?); $($label:ident: $test:expr);+$(;)?) => {
788 #[allow(path_statements, clippy::no_effect)]
789 {
790 {
791 struct Check<$($l,)* $($($t,)+)? $($(const $c:$ct,)+)?>($($($t,)+)?);
792 impl<$($l,)* $($($t$(:$bound)?,)+)? $($(const $c:$ct,)+)?> Check<$($l,)* $($($t,)+)? $($($c,)+)?> {
793 $(
794 const $label: () = assert!($test);
795 )+
796 }
797 generic_static_asserts!{@nested Check::<$($l,)* $($($t,)+)? $($($c,)+)?>, $($label: $test;)+}
798 }
799 }
800 };
801 (@nested $t:ty, $($label:ident: $test:expr;)+) => {
802 $(
803 <$t>::$label;
804 )+
805 }
806}
807
808pub(crate) use generic_static_asserts;