constriction/stream/
model.rs

1//! Probability distributions that can be used as entropy models for stream codes.
2//!
3//! This module provides utilities for dealing with probabilistic models of data sources
4//! ("entropy models") in exactly invertible fixed-point arithmetic so that no rounding
5//! errors occur. As explained in the [motivation](#motivation) below, preventing rounding
6//! errors is necessary for reliable entropy coding.
7//!
8//! The types defined in this module approximate arbitrary discrete (or quantized
9//! one-dimensional continuous) probability distributions with a fixed-point representation.
10//! The fixed-point representation has customizable numeric precision and can be either
11//! explicit or implicit (i.e., lazy). While the conversion to the fixed-point approximation
12//! itself generally involves rounding, once the fixed-point representation is obtained,
13//! operations on it are exact. Therefore, the fixed-point representation can be used for
14//! entropy coding.
15//!
16//! # Module Overview
17//!
18//! This module declares the base trait [`EntropyModel`] and its subtraits [`EncoderModel`]
19//! and [`DecoderModel`], which specify the interfaces that entropy models provide and that
20//! entropy coders in the sister modules can rely on.
21//!
22//! In addition, this module provides the following three utilities for constructing entropy
23//! models:
24//! - an adapter that converts parameterized discrete distributions (e.g., [`Binomial`]) or
25//!   one-dimensional continuous probability distributions (e.g. [`Gaussian`]) from a
26//!   representation in terms of float-valued functions to an (implicit) exactly invertible
27//!   fixed-point representation; when provided with a continuous distribution (a
28//!   probability density) then this adapter also quantizes the data space into bins. See
29//!   [`DefaultLeakyQuantizer`] and [`SmallLeakyQuantizer`];
30//! - types for representing arbitrary categorical distributions in an explicit fixed-point
31//!   representation; these types are intended either as fallbacks for probability
32//!   distributions that lack an efficiently evaluable analytic expression of the cumulative
33//!   distribution function (and that therefore can't be handled by the above adaptor), or
34//!   for efficient *encoding* of i.i.d. symbols by precalculating and tabularizing the
35//!   fixed-point representation of each allowed symbol. See [`DefaultLeakyQuantizer`],
36//!   [`DefaultContiguousCategoricalEntropyModel`],
37//!   [`DefaultNonContiguousCategoricalEncoderModel`], and
38//!   [`DefaultNonContiguousCategoricalDecoderModel`] (and their respective counterparts
39//!   with the "Small" instead of "Default" preset); and
40//! - types for high-performance "lookup tables" that enable efficient
41//!   *decoding* of i.i.d. data; these types build up a lookup table with `2^PRECISION`
42//!   entries (one entry per
43//!   possible *quantile*) and are therefore only recommended to be used with relatively
44//!   small `PRECISION`. See [`ContiguousLookupDecoderModel`] and
45//!   [`NonContiguousLookupDecoderModel`].
46//!
47//! # Examples
48//!
49//! See [`LeakyQuantizer`](LeakyQuantizer#examples), [`ContiguousCategoricalEntropyModel`],
50//! [`NonContiguousCategoricalEncoderModel`]. [`NonContiguousCategoricalDecoderModel`], and
51//! [`ContiguousLookupDecoderModel`] or [`NonContiguousLookupDecoderModel`].
52//!
53//! TODO: direct links to "Examples" sections.
54//!
55//! # Motivation
56//!
57//! The general idea of entropy coding to find an optimal compression strategy by using a
58//! *probabilistic model of the data source*. Ideally, all conceivable data points would be
59//! compressed into a short bit string. However, short bit strings are a scarce commodity:
60//! for any integer `N`, there are only `2^N - 1` distinct bit strings that are shorter than
61//! `N` bits. For this reason, entropy coding algorithms assign the scarce short bit strings
62//! to data points that are most probable to appear in practice, while assigning longer bit
63//! strings to data points that may be possible in principle but that are extremely
64//! improbable in practice. More precisely, entropy coding aims to minimize the expected bit
65//! rate under a probabilistic model of the data source. We refer to this model as an
66//! "entropy model".
67//!
68//! In contrast to many other use cases of probabilistic models in computing, entropy models
69//! must be amenable to *exact* arithmetic operations. In particular, no rounding errors are
70//! allowed when inverting the cumulative distribution function. Even a single arbitrarily
71//! small rounding error could set off a chain reaction leading to arbitrarily large and
72//! arbitrarily many errors when compressing and then decompressing a sequence of symbols
73//! (see, e.g., the [motivating example for the `ChainCoder`](super::chain#motivation)).
74//! This module provides utilities for defining entropy models that can be inverted exactly
75//! without any rounding errors.
76//!
77//! # Zero Probability
78//!
79//! All entropy models provided in this module have a predictable support, i.e., it is
80//! always easy to predict exactly which symbols have nonzero probability under the model.
81//! This is an important property for entropy coding because trying to encode a symbol that
82//! has zero probability under the used entropy model would fail.
83//!
84//! When constructing an entropy model then the caller always has to provide a support
85//! (either as an integer range or as a list of symbols of arbitrary type). All entropy
86//! models in this module enforce the following constraints:
87//!
88//! 1. all symbols within the user-provided support are assigned at least the smallest
89//!    nonzero probability that is representable at the used fixed-point `PRECISION` (even
90//!    if naive rounding would lead to zero probability);
91//! 2. all symbols that are not in the user-provided support have probability zero;
92//! 3. the probabilities add up to one (this even holds when, e.g., quantizing a continuous
93//!    probability distribution on a finite support that is smaller than the continuous
94//!    distribution's possibly unbounded support); and
95//! 4. no single symbol has probability one, i.e., we disallow degenerate entropy models
96//!    that put all probability mass on a single symbol, as such models can lead to problems
97//!    in some entropy coders (if you don't know whether you may encounter degenerate
98//!    entropy models for some symbols, just check for degeneracy and encode nothing in that
99//!    case since the corresponding symbols can be trivially reconstructed).
100//!
101//! When entropy models are constructed from a floating-point representation of some
102//! probability distribution then rounding is done in such a way that the above constraints
103//! are satisfied. When entropy models are constructed by passing in probabilities that are
104//! already in fixed-point representation, then the constructor verifies the above
105//! constraints in an efficient way.
106//!
107//! While constraints (1) and (4) above are strictly enforced (for types defined in this
108//! module), constraints (2) and (3) hold in practice but must not be relied on for memory
109//! safety as they can technically be violated without the use of `unsafe` (by using a
110//! [`LeakyQuantizer`] with an invalid [`Distribution`], i.e., one whose cumulative
111//! distribution function either isn't monotonic or has an image that exceeds the interval
112//! `[0, 1]`).
113//!
114//! [`stack`]: super::stack
115//! [`queue`]: super::queue
116//! [`Binomial`]: probability::distribution::Binomial
117//! [`Gaussian`]: probability::distribution::Gaussian
118
119/// Re-export of [`probability::distribution::Distribution`].
120///
121/// Most users will never have to interact with this trait directly. When a method requires
122/// a type that implements `Distribution`, most users will likely use a predefined type from
123/// the [`probability`] crate. You only need to implement this trait if you want to use a
124/// probability distribution that is not (yet) provided by the `probability` crate.
125///
126/// # See Also
127///
128/// - [`Inverse`]
129///
130/// [`probability::distribution::Distribution`]:
131///     https://docs.rs/probability/latest/probability/distribution/trait.Distribution.html
132/// [`probability`]: https://docs.rs/probability/latest/probability/
133pub use probability::distribution::Distribution;
134
135/// Re-export of [`probability::distribution::Inverse`].
136///
137/// Most users will never have to interact with this trait directly. When a method requires
138/// a type that implements `Inverse`, most users will likely use a predefined type from
139/// the [`probability`] crate. You only need to implement this trait if you want to use a
140/// probability distribution that is not (yet) provided by the `probability` crate.
141///
142/// # See Also
143///
144/// - [`Distribution`]
145///
146/// [`probability::distribution::Inverse`]:
147///     https://docs.rs/probability/latest/probability/distribution/trait.Inverse.html
148/// [`probability`]: https://docs.rs/probability/latest/probability/
149pub use probability::distribution::Inverse;
150
151mod categorical;
152mod quantize;
153mod uniform;
154
155use core::{borrow::Borrow, hash::Hash};
156
157use alloc::{boxed::Box, vec::Vec};
158
159use num_traits::{float::FloatCore, AsPrimitive, One, Zero};
160
161use crate::{BitArray, NonZeroBitArray};
162
163/// Base trait for probabilistic models of a data source.
164///
165/// All entropy models (see [module level documentation](self)) that can be used for
166/// encoding and/or decoding with stream codes must implement this trait and at least one of
167/// [`EncoderModel`] and/or [`DecoderModel`]. This trait exposes the type of [`Symbol`]s
168/// over which the entropy model is defined, the type that is used to represent a
169/// [`Probability`] in fixed-point arithmetic, and the fixed point `PRECISION` (see
170/// [discussion of type parameters](super#type-parameters-of-entropy-models)).
171///
172/// # Blanket Implementation for `&impl EntropyModel`
173///
174/// We provide the following blanket implementation for references to `EntropyModel`s:
175///
176/// ```ignore
177/// impl<M, const PRECISION: usize> EntropyModel<PRECISION> for &M
178/// where
179///     M: EntropyModel<PRECISION> + ?Sized
180/// { ... }
181/// ```
182///
183/// This means that, if some type `M` implements `EntropyModel<PRECISION>` for some
184/// `PRECISION`, then so does the reference type `&M`. Analogous blanket implementations are
185/// provided for the traits [`EncoderModel`], [`DecoderModel`], and
186/// [`IterableEntropyModel`]. The implementations simply delegate all calls to `M` (which is
187/// possible since all methods only take an `&self` receiver). Therefore:
188/// - you don't need to (and, in fact, currently can't) implement `EntropyModel`,
189///   `EncoderModel`, or `DecoderModel` for reference types `&M`; just implement these
190///   traits for "value types" `M` and you'll get the implementation for the corresponding
191///   reference types for free.
192/// - when you write a function or method that takes a generic entropy model as an argument,
193///   always take the entropy model (formally) *by value* (i.e., declare your function as
194///   `fn f(model: impl EntropyModel<PRECISION>)` or as `f<M:
195///   EntropyModel<PRECISION>>(model: M)`). Since all references to `EntropyModel`s are also
196///   `EntropyModel`s themselves, a function with one of these signatures can be called with
197///   an entropy model passed in either by value or by reference. If your function or method
198///   needs to pass out several copies of `model` then add an extra bound `M: Copy` (see,
199///   e.g., [`Encode::encode_iid_symbols`](super::Encode::encode_iid_symbols)). This will
200///   allow users to call your function either with a reference to an entropy model (all
201///   shared references implement `Copy`), or with some cheaply copyable entropy model such
202///   as a view to a lookup model (see [`ContiguousLookupDecoderModel::as_view`] or
203///   [`NonContiguousLookupDecoderModel::as_view`]).
204///
205/// # See Also
206///
207/// - [`EncoderModel`]
208/// - [`DecoderModel`]
209///
210/// [`Symbol`]: Self::Symbol
211/// [`Probability`]: Self::Probability
212pub trait EntropyModel<const PRECISION: usize> {
213    /// The type of data over which the entropy model is defined.
214    ///
215    /// This is the type of an item of the *uncompressed* data.
216    ///
217    /// Note that, although any given `EntropyModel` has a fixed associated `Symbol` type,
218    /// this doesn't prevent you from encoding heterogeneous sequences of symbols where each
219    /// symbol has a different type. You can use a different `EntropyModel` with a different
220    /// associated `Symbol` type for each symbol.
221    type Symbol;
222
223    /// The type used to represent probabilities, cumulatives, and quantiles.
224    ///
225    /// This is a primitive unsigned integer type that must hold at least `PRECISION` bits.
226    /// An integer value `p: Probability` semantically represents the probability,
227    /// cumulative, or quantile `p * 2.0^(-PRECISION)` (where `^` denotes exponentiation and
228    /// `PRECISION` is a const generic parameter of the trait `EntropyModel`).
229    ///
230    /// In many places where `constriction`'s public API *returns* probabilities, they have
231    /// already been verified to be nonzero. In such a case, the probability is returned as
232    /// a `Probability::NonZero`, which denotes the corresponding non-zeroable type (e.g.,
233    /// if `Probability` is `u32` then `Probability::NonZero` is
234    /// [`NonZeroU32`](core::num::NonZeroU32)). The "bare" `Probability` type is mostly used
235    /// for left-cumulatives and quantiles (i.e., for points on the y-axis in the graph of a
236    /// cumulative distribution function).
237    ///
238    /// # Enforcing the Constraints
239    ///
240    /// Implementations of `EntropyModel` are encouraged to enforce the constraint
241    /// `1 <= PRECISION <= Probability::BITS`. The simplest way to do so is by stating it as an
242    /// assertion `assert!(1 <= PRECISION && PRECISION <= Probability::BITS)` at the beginning of
243    /// relevant methods. This assertion has zero runtime cost because it can be
244    /// trivially evaluated at compile time and therefore will be optimized out if it holds.
245    /// As of `constriction` 0.4, implementations provided by `constriction` include a similar
246    /// assertion that is checked at compile time using const evaluation tricks.
247    ///
248    /// # (Internal) Representation of Probability One
249    ///
250    /// The case of "probability one" is treated specially. This case does not come up in
251    /// the public API since we disallow probability one for any individual symbol under any
252    /// entropy model, and since all left-sided cumulatives always correspond to a symbol
253    /// with nonzero probability. But the value "one" still comes up internally as the
254    /// right-cumulative of the last allowed symbol for any model. Although our treatment of
255    /// "probability one" can thus be considered an implementation detail, it is likely to
256    /// become an issue in third-party implementations of `EntropyModel`, so it is worth
257    /// documenting our recommended treatment.
258    ///
259    /// We internally represent "probability one" by its normal fixed-point representation
260    /// of `p = 1 << PRECISION` (i.e., `p = 2^PRECISION` in mathematical notation) if this
261    /// value fits into `Probability`, i.e., if `PRECISION != Probability::BITS`. In the
262    /// (uncommon) case where `PRECISION == Probability::BITS`, we represent "probability
263    /// one" as the integer zero (i.e., cutting off the overflowing bit). This means that
264    /// any probability that is guaranteed to not be one can always be calculated by
265    /// subtracting its left-sided cumulative from its right-sided cumulative in wrapping
266    /// arithmetic. However, this convention means that one has to be careful not to confuse
267    /// probability zero with probabilty one. In our implementations, these two possible
268    /// interpretations of the integer `p = 0` always turned out to be easy to disambiguate
269    /// statically.
270    type Probability: BitArray;
271}
272
273/// A trait for [`EntropyModel`]s that can be used for encoding (compressing) data.
274///
275/// As discussed in the [module level documentation](self), all stream codes in
276/// `constriction` use so-called [`EntropyModel`]s for encoding and/or decoding data. Some
277/// of these `EntropyModel`s may be used only for encoding, only for decoding, or for both,
278/// depending on their internal representation.
279///
280/// This `EncoderModel` trait is implemented for all entropy models that can be used for
281/// *encoding* data. To encode data with an `EncoderModel`, construct an entropy coder that
282/// implements the [`Encode`] trait and pass the data and the entropy model to one of the
283/// methods of the [`Encode`] trait (or to an inherent method of the entropy coder, such as
284/// [`AnsCoder::encode_symbols_reverse`]).
285///
286/// # Blanket Implementation for `&impl EncoderModel`
287///
288/// We provide the following blanket implementation for references to `EncoderModel`s:
289///
290/// ```ignore
291/// impl<M, const PRECISION: usize> EncoderModel<PRECISION> for &M
292/// where
293///     M: EncoderModel<PRECISION> + ?Sized
294/// { ... }
295/// ```
296///
297/// This means that, if some type `M` implements `EncoderModel<PRECISION>` for some
298/// `PRECISION`, then so does the reference type `&M`. Therefore, generic functions or
299/// methods should never take a generic `EncoderModel` by reference. They should always take
300/// the generic `EncoderModel` *by value* because this also covers the case of references
301/// but is strictly more general. If your generic function needs to be able to cheaply copy
302/// the `EncoderModel` (as it could with a shared reference) then it should still take the
303/// generic `EncoderModel` formally by value and just add an additional `Copy` bound (see,
304/// e.g., the method signature of [`Encode::encode_iid_symbols`]. For a more elaborate
305/// explanation, please refer to the discussion of the analogous blanket implementation for
306/// [`EntropyModel`].
307///
308/// # See Also
309///
310/// - base trait: [`EntropyModel`]
311/// - sister trait: [`DecoderModel`]
312/// - used with: [`Encode`]
313///
314/// [`Encode`]: super::Encode
315/// [`AnsCoder::encode_symbols_reverse`]: super::stack::AnsCoder::encode_symbols_reverse
316/// [`Encode::encode_iid_symbols`]: super::Encode::encode_iid_symbols
317pub trait EncoderModel<const PRECISION: usize>: EntropyModel<PRECISION> {
318    /// Looks up a symbol in the entropy model.
319    ///
320    /// Takes a `symbol` either by value or by reference and looks it up in the entropy
321    /// model.
322    /// - If `symbol` has a nonzero probability under the model, then this method returns
323    ///   `Some((left_sided_cumulative, probability))`, where `probability` is the
324    ///   probability in fixed-point representation (see
325    ///   [discussion](EntropyModel::Probability)) and `left_sided_cumulative` is the sum of
326    ///   the probabilities of all symbols up to but not including `symbol` (also in
327    ///   fixed-point representation). Both `left_sided_cumulative` and `probability` are
328    ///   guaranteed to be strictly smaller than `1 << PRECISION` (which would semantically
329    ///   represent "probability one") because `probability` is nonzero and because we don't
330    ///   support degenerate entropy models that put all probability mass on a single
331    ///   symbol.
332    /// - If `symbol` has zero probability under the model, then this method returns `None`.
333    fn left_cumulative_and_probability(
334        &self,
335        symbol: impl Borrow<Self::Symbol>,
336    ) -> Option<(Self::Probability, <Self::Probability as BitArray>::NonZero)>;
337
338    /// Returns the probability of the given symbol in floating point representation.
339    ///
340    /// The trait bound `Self::Probability: Into<F>` guarantees that no rounding occurs in
341    /// the conversion. You may have to specify the return type explicitly using "turbofish"
342    /// notation `::<f64>(...)` or `::<f32>(...)`, see example below.
343    ///
344    /// Returns `0.0` if `symbol` is not in the support of the entropy model.
345    ///
346    /// This method is provided mainly as a convenience for debugging.
347    ///
348    /// # Example
349    ///
350    /// ```
351    /// use constriction::stream::model::{EncoderModel, DefaultNonContiguousCategoricalEncoderModel};
352    ///
353    /// let symbols = vec!['a', 'b', 'c', 'd'];
354    /// let probabilities = vec![1u32 << 21, 1 << 23, 1 << 22, 1 << 21];
355    /// let model = DefaultNonContiguousCategoricalEncoderModel // "Default" uses `PRECISION = 24`
356    ///     ::from_symbols_and_nonzero_fixed_point_probabilities(
357    ///         symbols.iter().copied(), probabilities.iter().copied(), false)
358    ///     .unwrap();
359    ///
360    /// assert_eq!(model.floating_point_probability::<f64>('a'), 0.125);
361    /// assert_eq!(model.floating_point_probability::<f64>('b'), 0.5);
362    /// assert_eq!(model.floating_point_probability::<f64>('c'), 0.25);
363    /// assert_eq!(model.floating_point_probability::<f64>('d'), 0.125);
364    /// assert_eq!(model.floating_point_probability::<f64>('x'), 0.0);
365    /// ```
366    ///
367    /// [`fixed_point_probabilities`]: #method.fixed_point_probabilities
368    /// [`floating_point_probabilities_lossy`]: #method.floating_point_probabilities_lossy
369    /// [`from_floating_point_probabilities`]: #method.from_floating_point_probabilities
370    /// [`from_nonzero_fixed_point_probabilities`]:
371    /// #method.from_nonzero_fixed_point_probabilities
372    #[inline]
373    fn floating_point_probability<F>(&self, symbol: Self::Symbol) -> F
374    where
375        F: FloatCore,
376        Self::Probability: Into<F>,
377    {
378        // This gets compiled to a single floating point multiplication rather than a (slow)
379        // division.
380        let whole = (F::one() + F::one()) * (Self::Probability::one() << (PRECISION - 1)).into();
381        let probability = self
382            .left_cumulative_and_probability(symbol)
383            .map_or(Self::Probability::zero(), |(_, p)| p.get());
384        probability.into() / whole
385    }
386}
387
388/// A trait for [`EntropyModel`]s that can be used for decoding (decompressing) data.
389///
390/// As discussed in the [module level documentation](self), all stream codes in
391/// `constriction` use so-called [`EntropyModel`]s for encoding and/or decoding data. Some
392/// of these `EntropyModel`s may be used only for encoding, only for decoding, or for both,
393/// depending on their internal representation.
394///
395/// This `DecoderModel` trait is implemented for all entropy models that can be used for
396/// *decoding* data. To decode data with a `DecoderModel`, construct an entropy coder that
397/// implements the [`Decode`] trait and pass the entropy model to one of the methods of the
398/// [`Decode`] trait.
399///
400/// # Blanket Implementation for `&impl DecoderModel`
401///
402/// We provide the following blanket implementation for references to `DecoderModel`s:
403///
404/// ```ignore
405/// impl<M, const PRECISION: usize> DecoderModel<PRECISION> for &M
406/// where
407///     M: DecoderModel<PRECISION> + ?Sized
408/// { ... }
409/// ```
410///
411/// This means that, if some type `M` implements `DecoderModel<PRECISION>` for some
412/// `PRECISION`, then so does the reference type `&M`. Therefore, generic functions or
413/// methods should never take a generic `DecoderModel` by reference. They should always take
414/// the generic `DecoderModel` *by value* because this also covers the case of references
415/// but is strictly more general. If your generic function needs to be able to cheaply copy
416/// the `DecoderModel` (as it could with a shared reference) then it should still take the
417/// generic `DecoderModel` formally by value and just add an additional `Copy` bound (see,
418/// e.g., the method signature of [`Decode::decode_iid_symbols`]. For a more elaborate
419/// explanation, please refer to the discussion of the analogous blanket implementation for
420/// [`EntropyModel`].
421///
422/// # See Also
423///
424/// - base trait: [`EntropyModel`]
425/// - sister trait: [`EncoderModel`]
426/// - used with: [`Decode`]
427///
428/// [`Decode`]: super::Decode
429/// [`Decode::decode_iid_symbols`]: super::Encode::encode_iid_symbols
430pub trait DecoderModel<const PRECISION: usize>: EntropyModel<PRECISION> {
431    /// Looks up the symbol for a given quantile.
432    ///
433    /// The argument `quantile` represents a number in the half-open interval `[0, 1)` in
434    /// fixed-point arithmetic, i.e., it must be strictly smaller than `1 << PRECISION`.
435    /// Think of `quantile` as a point on the vertical axis of a plot of the cumulative
436    /// distribution function of the probability model. This method evaluates the inverse of
437    /// the cumulative distribution function, which is sometimes called the *quantile
438    /// function*.
439    ///
440    /// Returns a tuple `(symbol, left_sided_cumulative, probability)` where `probability`
441    /// is the probability of `symbol` under the entropy model (in fixed-point arithmetic)
442    /// and `left_sided_cumulative` is the sum of the probabilities of all symbols up to and
443    /// not including `symbol`. The returned `symbol` is the unique symbol that satisfies
444    /// `left_sided_cumulative <= quantile < left_sided_cumulative + probability` (where the
445    /// addition on the right-hand side is non-wrapping).
446    ///
447    /// Note that, in contrast to [`EncoderModel::left_cumulative_and_probability`], this
448    /// method does *not* return an `Option`. This is because, as long as `quantile < 1 <<
449    /// PRECISION`, a valid probability distribution always has a symbol for which the range
450    /// `left_sided_cumulative..(left_sided_cumulative + quantile)` contains `quantile`, and
451    /// the probability of this symbol is guaranteed to be nonzero because the probability
452    /// is the size of the range, which contains at least the one element `quantile`.
453    ///
454    /// # Panics
455    ///
456    /// Implementations may panic if `quantile >= 1 << PRECISION`.
457    fn quantile_function(
458        &self,
459        quantile: Self::Probability,
460    ) -> (
461        Self::Symbol,
462        Self::Probability,
463        <Self::Probability as BitArray>::NonZero,
464    );
465}
466
467/// A trait for [`EntropyModel`]s that can be serialized into a common format.
468///
469/// The method [`symbol_table`] iterates over all symbols with nonzero probability under the
470/// entropy. The iteration occurs in uniquely defined order of increasing left-sided
471/// cumulative probability distribution of the symbols. All `EntropyModel`s for which such
472/// iteration can be implemented efficiently should implement this trait. `EntropyModel`s
473/// for which such iteration would require extra work (e.g., sorting symbols by left-sided
474/// cumulative distribution) should *not* implement this trait so that callers can assume
475/// that calling `symbol_table` is cheap.
476///
477/// The main advantage of implementing this trait is that it provides default
478/// implementations of conversions to various other `EncoderModel`s and `DecoderModel`s, see
479/// [`to_generic_encoder_model`], [`to_generic_decoder_model`], and
480/// [`to_generic_lookup_decoder_model`].
481///
482/// [`symbol_table`]: Self::symbol_table
483/// [`to_generic_encoder_model`]: Self::to_generic_encoder_model
484/// [`to_generic_decoder_model`]: Self::to_generic_decoder_model
485/// [`to_generic_lookup_decoder_model`]: Self::to_generic_lookup_decoder_model
486pub trait IterableEntropyModel<'m, const PRECISION: usize>: EntropyModel<PRECISION> {
487    /// Iterates over all symbols in the unique order that is consistent with the cumulative
488    /// distribution.
489    ///
490    /// The iterator iterates in order of increasing cumulative.
491    ///
492    /// This method may be used, e.g., to export the model into a serializable format. It is
493    /// also used internally by constructors that create a different but equivalent
494    /// representation of the same entropy model (e.g., to construct a
495    /// [`ContiguousLookupDecoderModel`] or [`NonContiguousLookupDecoderModel`] from some `EncoderModel`).
496    ///
497    /// # Example
498    ///
499    /// ```
500    /// use constriction::stream::model::{
501    ///     IterableEntropyModel, SmallNonContiguousCategoricalDecoderModel
502    /// };
503    ///
504    /// let symbols = vec!['a', 'b', 'x', 'y'];
505    /// let probabilities = vec![0.125, 0.5, 0.25, 0.125]; // Can all be represented without rounding.
506    /// let model = SmallNonContiguousCategoricalDecoderModel
507    ///     ::from_symbols_and_floating_point_probabilities_fast(
508    ///         symbols.iter().cloned(),
509    ///         &probabilities,
510    ///         None
511    ///     ).unwrap();
512    ///
513    /// // Print a table representation of this entropy model (e.g., for debugging).
514    /// dbg!(model.symbol_table().collect::<Vec<_>>());
515    ///
516    /// // Create a lookup model. This method is provided by the trait `IterableEntropyModel`.
517    /// let lookup_decoder_model = model.to_generic_lookup_decoder_model();
518    /// ```
519    ///
520    /// # See also
521    ///
522    /// - [`floating_point_symbol_table`](Self::floating_point_symbol_table)
523    fn symbol_table(
524        &'m self,
525    ) -> impl Iterator<
526        Item = (
527            Self::Symbol,
528            Self::Probability,
529            <Self::Probability as BitArray>::NonZero,
530        ),
531    >;
532
533    /// Similar to [`symbol_table`], but yields both cumulatives and probabilities in
534    /// floating point representation.
535    ///
536    /// The conversion to floats is guaranteed to be lossless due to the trait bound `F:
537    /// From<Self::Probability>`.
538    ///
539    /// [`symbol_table`]: Self::symbol_table
540    ///
541    /// TODO: test
542    fn floating_point_symbol_table<F>(&'m self) -> impl Iterator<Item = (Self::Symbol, F, F)>
543    where
544        F: FloatCore + From<Self::Probability> + 'm,
545        Self::Probability: Into<F>,
546    {
547        // This gets compiled into a constant, and the divisions by `whole` get compiled
548        // into floating point multiplications rather than (slower) divisions.
549        let whole = (F::one() + F::one()) * (Self::Probability::one() << (PRECISION - 1)).into();
550
551        self.symbol_table()
552            .map(move |(symbol, cumulative, probability)| {
553                (
554                    symbol,
555                    cumulative.into() / whole,
556                    probability.get().into() / whole,
557                )
558            })
559    }
560
561    /// Returns the entropy in units of bits (i.e., base 2).
562    ///
563    /// The entropy is the expected amortized bit rate per symbol of an optimal lossless
564    /// entropy coder, assuming that the data is indeed distributed according to the model.
565    ///
566    /// Note that calling this method on a [`LeakilyQuantizedDistribution`] will return the
567    /// entropy *after quantization*, not the differential entropy of the underlying
568    /// continuous probability distribution.
569    ///
570    /// # See also
571    ///
572    /// - [`cross_entropy_base2`](Self::cross_entropy_base2)
573    /// - [`reverse_cross_entropy_base2`](Self::reverse_cross_entropy_base2)
574    /// - [`kl_divergence_base2`](Self::kl_divergence_base2)
575    /// - [`reverse_kl_divergence_base2`](Self::reverse_kl_divergence_base2)
576    fn entropy_base2<F>(&'m self) -> F
577    where
578        F: num_traits::Float + core::iter::Sum,
579        Self::Probability: Into<F>,
580    {
581        let scaled_shifted = self
582            .symbol_table()
583            .map(|(_, _, probability)| {
584                let probability = probability.get().into();
585                probability * probability.log2() // probability is guaranteed to be nonzero.
586            })
587            .sum::<F>();
588
589        let whole = (F::one() + F::one()) * (Self::Probability::one() << (PRECISION - 1)).into();
590        F::from(PRECISION).unwrap() - scaled_shifted / whole
591    }
592
593    /// Returns the cross entropy between argument `p` and this model in units of bits
594    /// (i.e., base 2).
595    ///
596    /// This is the expected amortized bit rate per symbol that an optimal coder will
597    /// achieve when using this model on a data source that draws symbols from the provided
598    /// probability distribution `p`.
599    ///
600    /// The cross entropy is defined as `H(p, self) = - sum_i p[i] * log2(self[i])` where
601    /// `p` is provided as an argument and `self[i]` denotes the corresponding probabilities
602    /// of the model. Note that `self[i]` is never zero for models in the `constriction`
603    /// library, so the logarithm in the (forward) cross entropy can never be infinite.
604    ///
605    /// The argument `p` must yield a sequence of probabilities (nonnegative values that sum
606    /// to 1) with the correct length and order to be compatible with the model.
607    ///
608    /// # See also
609    ///
610    /// - [`entropy_base2`](Self::entropy_base2)
611    /// - [`reverse_cross_entropy_base2`](Self::reverse_cross_entropy_base2)
612    /// - [`kl_divergence_base2`](Self::kl_divergence_base2)
613    /// - [`reverse_kl_divergence_base2`](Self::reverse_kl_divergence_base2)
614    fn cross_entropy_base2<F>(&'m self, p: impl IntoIterator<Item = F>) -> F
615    where
616        F: num_traits::Float + core::iter::Sum,
617        Self::Probability: Into<F>,
618    {
619        let shift = F::from(PRECISION).unwrap();
620        self.symbol_table()
621            .zip(p)
622            .map(|((_, _, probability), p)| {
623                let probability = probability.get().into();
624                // Perform the shift for each item individually so that the result is
625                // reasonable even if `p` is not normalized.
626                p * (shift - probability.log2()) // probability is guaranteed to be nonzero.
627            })
628            .sum::<F>()
629    }
630
631    /// Returns the cross entropy between this model and argument `p` in units of bits
632    /// (i.e., base 2).
633    ///
634    /// This method is provided mostly for completeness. You're more likely to want to
635    /// calculate [`cross_entropy_base2`](Self::cross_entropy_base2).
636    ///
637    /// The reverse cross entropy is defined as `H(self, p) = - sum_i self[i] * log2(p[i])`
638    /// where `p` is provided as an argument and `self[i]` denotes the corresponding
639    /// probabilities of the model.
640    ///
641    /// The argument `p` must yield a sequence of *nonzero* probabilities (that sum to 1)
642    /// with the correct length and order to be compatible with the model.
643    ///
644    /// # See also
645    ///
646    /// - [`cross_entropy_base2`](Self::cross_entropy_base2)
647    /// - [`entropy_base2`](Self::entropy_base2)
648    /// - [`reverse_kl_divergence_base2`](Self::reverse_kl_divergence_base2)
649    /// - [`kl_divergence_base2`](Self::kl_divergence_base2)
650    fn reverse_cross_entropy_base2<F>(&'m self, p: impl IntoIterator<Item = F>) -> F
651    where
652        F: num_traits::Float + core::iter::Sum,
653        Self::Probability: Into<F>,
654    {
655        let scaled = self
656            .symbol_table()
657            .zip(p)
658            .map(|((_, _, probability), p)| {
659                let probability = probability.get().into();
660                probability * p.log2()
661            })
662            .sum::<F>();
663
664        let whole = (F::one() + F::one()) * (Self::Probability::one() << (PRECISION - 1)).into();
665        -scaled / whole
666    }
667
668    /// Returns Kullback-Leibler divergence `D_KL(p || self)`
669    ///
670    /// This is the expected *overhead* (due to model quantization) in bit rate per symbol
671    /// that an optimal coder will incur when using this model on a data source that draws
672    /// symbols from the provided probability distribution `p` (which this model is supposed
673    /// to approximate).
674    ///
675    /// The KL-divergence is defined as `D_KL(p || self) = - sum_i p[i] * log2(self[i] /
676    /// p[i])`, where `p` is provided as an argument and `self[i]` denotes the corresponding
677    /// probabilities of the model. Any term in the sum where `p[i]` is *exactly* zero does
678    /// not contribute (regardless of whether or not `self[i]` would also be zero).
679    ///
680    /// The argument `p` must yield a sequence of probabilities (nonnegative values that sum
681    /// to 1) with the correct length and order to be compatible with the model.
682    ///
683    /// # See also
684    ///
685    /// - [`reverse_kl_divergence_base2`](Self::reverse_kl_divergence_base2)
686    /// - [`entropy_base2`](Self::entropy_base2)
687    /// - [`cross_entropy_base2`](Self::cross_entropy_base2)
688    /// - [`reverse_cross_entropy_base2`](Self::reverse_cross_entropy_base2)
689    fn kl_divergence_base2<F>(&'m self, p: impl IntoIterator<Item = F>) -> F
690    where
691        F: num_traits::Float + core::iter::Sum,
692        Self::Probability: Into<F>,
693    {
694        let shifted = self
695            .symbol_table()
696            .zip(p)
697            .map(|((_, _, probability), p)| {
698                if p == F::zero() {
699                    F::zero()
700                } else {
701                    let probability = probability.get().into();
702                    p * (p.log2() - probability.log2())
703                }
704            })
705            .sum::<F>();
706
707        shifted + F::from(PRECISION).unwrap() // assumes that `p` is normalized
708    }
709
710    /// Returns reverse Kullback-Leibler divergence, i.e., `D_KL(self || p)`
711    ///
712    /// This method is provided mostly for completeness. You're more likely to want to
713    /// calculate [`kl_divergence_base2`](Self::kl_divergence_base2).
714    ///
715    /// The reverse KL-divergence is defined as `D_KL(self || p) = - sum_i self[i] *
716    /// log2(p[i] / self[i])` where `p`
717    /// is provided as an argument and `self[i]` denotes the corresponding probabilities of
718    /// the model.
719    ///
720    /// The argument `p` must yield a sequence of *nonzero* probabilities (that sum to 1)
721    /// with the correct length and order to be compatible with the model.
722    ///
723    /// # See also
724    ///
725    /// - [`kl_divergence_base2`](Self::kl_divergence_base2)
726    /// - [`entropy_base2`](Self::entropy_base2)
727    /// - [`cross_entropy_base2`](Self::cross_entropy_base2)
728    /// - [`reverse_cross_entropy_base2`](Self::reverse_cross_entropy_base2)
729    fn reverse_kl_divergence_base2<F>(&'m self, p: impl IntoIterator<Item = F>) -> F
730    where
731        F: num_traits::Float + core::iter::Sum,
732        Self::Probability: Into<F>,
733    {
734        let scaled_shifted = self
735            .symbol_table()
736            .zip(p)
737            .map(|((_, _, probability), p)| {
738                let probability = probability.get().into();
739                probability * (probability.log2() - p.log2())
740            })
741            .sum::<F>();
742
743        let whole = (F::one() + F::one()) * (Self::Probability::one() << (PRECISION - 1)).into();
744        scaled_shifted / whole - F::from(PRECISION).unwrap()
745    }
746
747    /// Creates an [`EncoderModel`] from this `EntropyModel`
748    ///
749    /// This is a fallback method that should only be used if no more specialized
750    /// conversions are available. It generates a [`NonContiguousCategoricalEncoderModel`]
751    /// with the same probabilities and left-sided cumulatives as `self`. Note that a
752    /// `NonContiguousCategoricalEncoderModel` is very generic and therefore not
753    /// particularly optimized. Thus, before calling this method first check:
754    /// - if the original `Self` type already implements `EncoderModel` (some types
755    ///   implement *both* `EncoderModel` and `DecoderModel`); or
756    /// - if the `Self` type has some inherent method with a name like `to_encoder_model`;
757    ///   if it does, that method probably returns an implementation of `EncoderModel` that
758    ///   is better optimized for your use case.
759    #[inline(always)]
760    fn to_generic_encoder_model(
761        &'m self,
762    ) -> NonContiguousCategoricalEncoderModel<Self::Symbol, Self::Probability, PRECISION>
763    where
764        Self::Symbol: Hash + Eq,
765    {
766        self.into()
767    }
768
769    /// Creates a [`DecoderModel`] from this `EntropyModel`
770    ///
771    /// This is a fallback method that should only be used if no more specialized
772    /// conversions are available. It generates a [`NonContiguousCategoricalDecoderModel`]
773    /// with the same probabilities and left-sided cumulatives as `self`. Note that a
774    /// `NonContiguousCategoricalEncoderModel` is very generic and therefore not
775    /// particularly optimized. Thus, before calling this method first check:
776    /// - if the original `Self` type already implements `DecoderModel` (some types
777    ///   implement *both* `EncoderModel` and `DecoderModel`); or
778    /// - if the `Self` type has some inherent method with a name like `to_decoder_model`;
779    ///   if it does, that method probably returns an implementation of `DecoderModel` that
780    ///   is better optimized for your use case.
781    #[inline(always)]
782    fn to_generic_decoder_model(
783        &'m self,
784    ) -> NonContiguousCategoricalDecoderModel<
785        Self::Symbol,
786        Self::Probability,
787        Vec<(Self::Probability, Self::Symbol)>,
788        PRECISION,
789    >
790    where
791        Self::Symbol: Clone,
792    {
793        self.into()
794    }
795
796    /// Creates a [`DecoderModel`] from this `EntropyModel`
797    ///
798    /// This is a fallback method that should only be used if no more specialized
799    /// conversions are available. It generates a [`ContiguousLookupDecoderModel`] or [`NonContiguousLookupDecoderModel`] that makes no
800    /// assumption about contiguity of the support. Thus, before calling this method first
801    /// check if the `Self` type has some inherent method with a name like
802    /// `to_lookup_decoder_model`. If it does, that method probably returns a
803    /// `LookupDecoderModel` that is better optimized for your use case.
804    #[inline(always)]
805    fn to_generic_lookup_decoder_model(
806        &'m self,
807    ) -> NonContiguousLookupDecoderModel<
808        Self::Symbol,
809        Self::Probability,
810        Vec<(Self::Probability, Self::Symbol)>,
811        Box<[Self::Probability]>,
812        PRECISION,
813    >
814    where
815        Self::Probability: Into<usize>,
816        usize: AsPrimitive<Self::Probability>,
817        Self::Symbol: Clone + Default,
818    {
819        self.into()
820    }
821}
822
823impl<M, const PRECISION: usize> EntropyModel<PRECISION> for &M
824where
825    M: EntropyModel<PRECISION> + ?Sized,
826{
827    type Probability = M::Probability;
828    type Symbol = M::Symbol;
829}
830
831impl<M, const PRECISION: usize> EncoderModel<PRECISION> for &M
832where
833    M: EncoderModel<PRECISION> + ?Sized,
834{
835    #[inline(always)]
836    fn left_cumulative_and_probability(
837        &self,
838        symbol: impl Borrow<Self::Symbol>,
839    ) -> Option<(Self::Probability, <Self::Probability as BitArray>::NonZero)> {
840        (*self).left_cumulative_and_probability(symbol)
841    }
842}
843
844impl<M, const PRECISION: usize> DecoderModel<PRECISION> for &M
845where
846    M: DecoderModel<PRECISION> + ?Sized,
847{
848    #[inline(always)]
849    fn quantile_function(
850        &self,
851        quantile: Self::Probability,
852    ) -> (
853        Self::Symbol,
854        Self::Probability,
855        <Self::Probability as BitArray>::NonZero,
856    ) {
857        (*self).quantile_function(quantile)
858    }
859}
860
861impl<'m, M, const PRECISION: usize> IterableEntropyModel<'m, PRECISION> for &'m M
862where
863    M: IterableEntropyModel<'m, PRECISION>,
864{
865    fn symbol_table(
866        &'m self,
867    ) -> impl Iterator<
868        Item = (
869            Self::Symbol,
870            Self::Probability,
871            <Self::Probability as BitArray>::NonZero,
872        ),
873    > {
874        (*self).symbol_table()
875    }
876
877    fn entropy_base2<F>(&'m self) -> F
878    where
879        F: num_traits::Float + core::iter::Sum,
880        Self::Probability: Into<F>,
881    {
882        (*self).entropy_base2()
883    }
884
885    #[inline(always)]
886    fn to_generic_encoder_model(
887        &'m self,
888    ) -> NonContiguousCategoricalEncoderModel<Self::Symbol, Self::Probability, PRECISION>
889    where
890        Self::Symbol: Hash + Eq,
891    {
892        (*self).to_generic_encoder_model()
893    }
894
895    #[inline(always)]
896    fn to_generic_decoder_model(
897        &'m self,
898    ) -> NonContiguousCategoricalDecoderModel<
899        Self::Symbol,
900        Self::Probability,
901        Vec<(Self::Probability, Self::Symbol)>,
902        PRECISION,
903    >
904    where
905        Self::Symbol: Clone,
906    {
907        (*self).to_generic_decoder_model()
908    }
909}
910
911pub use categorical::{
912    contiguous::{
913        ContiguousCategoricalEntropyModel, DefaultContiguousCategoricalEntropyModel,
914        SmallContiguousCategoricalEntropyModel,
915    },
916    lazy_contiguous::{
917        DefaultLazyContiguousCategoricalEntropyModel, LazyContiguousCategoricalEntropyModel,
918        SmallLazyContiguousCategoricalEntropyModel,
919    },
920    lookup_contiguous::{ContiguousLookupDecoderModel, SmallContiguousLookupDecoderModel},
921    lookup_noncontiguous::{NonContiguousLookupDecoderModel, SmallNonContiguousLookupDecoderModel},
922    non_contiguous::{
923        DefaultNonContiguousCategoricalDecoderModel, DefaultNonContiguousCategoricalEncoderModel,
924        NonContiguousCategoricalDecoderModel, NonContiguousCategoricalEncoderModel,
925        SmallNonContiguousCategoricalDecoderModel, SmallNonContiguousCategoricalEncoderModel,
926    },
927};
928pub use quantize::{
929    DefaultLeakyQuantizer, LeakilyQuantizedDistribution, LeakyQuantizer, SmallLeakyQuantizer,
930};
931pub use uniform::{DefaultUniformModel, SmallUniformModel, UniformModel};
932
933#[cfg(test)]
934mod tests {
935    use probability::prelude::*;
936
937    use super::*;
938
939    #[test]
940    fn entropy() {
941        #[cfg(not(miri))]
942        let (support, std_devs, means) = (-1000..=1000, [100., 200., 300.], [-10., 2.3, 50.1]);
943
944        // We use different settings when testing on miri so that the test time stays reasonable.
945        #[cfg(miri)]
946        let (support, std_devs, means) = (-100..=100, [10., 20., 30.], [-1., 0.23, 5.01]);
947
948        let quantizer = LeakyQuantizer::<_, _, u32, 24>::new(support);
949        for &std_dev in &std_devs {
950            for &mean in &means {
951                let distribution = Gaussian::new(mean, std_dev);
952                let model = quantizer.quantize(distribution);
953                let entropy = model.entropy_base2::<f64>();
954                let expected_entropy = 2.047095585180641 + std_dev.log2();
955                assert!((entropy - expected_entropy).abs() < 0.01);
956            }
957        }
958    }
959
960    pub(super) fn test_entropy_model<'m, D, const PRECISION: usize>(
961        model: &'m D,
962        support: impl Clone + Iterator<Item = D::Symbol>,
963    ) where
964        D: IterableEntropyModel<'m, PRECISION>
965            + EncoderModel<PRECISION>
966            + DecoderModel<PRECISION>
967            + 'm,
968        D::Symbol: Copy + core::fmt::Debug + PartialEq,
969        D::Probability: Into<u64>,
970        u64: AsPrimitive<D::Probability>,
971    {
972        let mut sum = 0;
973        for symbol in support.clone() {
974            let (left_cumulative, prob) = model.left_cumulative_and_probability(symbol).unwrap();
975            assert_eq!(left_cumulative.into(), sum);
976            sum += prob.get().into();
977
978            let expected = (symbol, left_cumulative, prob);
979            assert_eq!(model.quantile_function(left_cumulative), expected);
980            assert_eq!(model.quantile_function((sum - 1).as_()), expected);
981            assert_eq!(
982                model.quantile_function((left_cumulative.into() + prob.get().into() / 2).as_()),
983                expected
984            );
985        }
986        assert_eq!(sum, 1 << PRECISION);
987
988        test_iterable_entropy_model(model, support);
989    }
990
991    pub(super) fn test_iterable_entropy_model<'m, M, const PRECISION: usize>(
992        model: &'m M,
993        support: impl Clone + Iterator<Item = M::Symbol>,
994    ) where
995        M: IterableEntropyModel<'m, PRECISION> + 'm,
996        M::Symbol: Copy + core::fmt::Debug + PartialEq,
997        M::Probability: Into<u64>,
998        u64: AsPrimitive<M::Probability>,
999    {
1000        let mut expected_cumulative = 0u64;
1001        let mut count = 0;
1002        for (expected_symbol, (symbol, left_sided_cumulative, probability)) in
1003            support.clone().zip(model.symbol_table())
1004        {
1005            assert_eq!(symbol, expected_symbol);
1006            assert_eq!(left_sided_cumulative.into(), expected_cumulative);
1007            expected_cumulative += probability.get().into();
1008            count += 1;
1009        }
1010        assert_eq!(count, support.size_hint().0);
1011        assert_eq!(expected_cumulative, 1 << PRECISION);
1012    }
1013
1014    /// Verifies that the model is close to a provided probability mass function (in
1015    /// KL-divergence).
1016    pub(super) fn verify_iterable_entropy_model<'m, M, P, const PRECISION: usize>(
1017        model: &'m M,
1018        hist: &[P],
1019        tol: f64,
1020    ) -> f64
1021    where
1022        M: IterableEntropyModel<'m, PRECISION> + 'm,
1023        M::Probability: BitArray + Into<u64> + Into<f64>,
1024        P: num_traits::Zero + Into<f64> + Copy + PartialOrd,
1025    {
1026        let weights: Vec<_> = model
1027            .symbol_table()
1028            .map(|(_, _, probability)| probability.get())
1029            .collect();
1030
1031        assert_eq!(weights.len(), hist.len());
1032        assert_eq!(
1033            weights.iter().map(|&x| Into::<u64>::into(x)).sum::<u64>(),
1034            1 << PRECISION
1035        );
1036        for &w in &weights {
1037            assert!(w > M::Probability::zero());
1038        }
1039
1040        let mut weights_and_hist = weights
1041            .iter()
1042            .cloned()
1043            .zip(hist.iter().cloned())
1044            .collect::<Vec<_>>();
1045
1046        // Check that sorting by weight is compatible with sorting by hist.
1047        weights_and_hist.sort_unstable_by(|a, b| a.partial_cmp(b).unwrap());
1048        // TODO: replace the following with
1049        // `assert!(weights_and_hist.iter().map(|&(_, x)| x).is_sorted())`
1050        // when `is_sorted` becomes stable.
1051        let mut previous = P::zero();
1052        for (_, hist) in weights_and_hist {
1053            assert!(hist >= previous);
1054            previous = hist;
1055        }
1056
1057        let normalization = hist.iter().map(|&x| x.into()).sum::<f64>();
1058        let normalized_hist = hist
1059            .iter()
1060            .map(|&x| Into::<f64>::into(x) / normalization)
1061            .collect::<Vec<_>>();
1062
1063        let kl = model.kl_divergence_base2::<f64>(normalized_hist);
1064        assert!(kl < tol);
1065
1066        kl
1067    }
1068}