constriction/stream/model.rs
1//! Probability distributions that can be used as entropy models for stream codes.
2//!
3//! This module provides utilities for dealing with probabilistic models of data sources
4//! ("entropy models") in exactly invertible fixed-point arithmetic so that no rounding
5//! errors occur. As explained in the [motivation](#motivation) below, preventing rounding
6//! errors is necessary for reliable entropy coding.
7//!
8//! The types defined in this module approximate arbitrary discrete (or quantized
9//! one-dimensional continuous) probability distributions with a fixed-point representation.
10//! The fixed-point representation has customizable numeric precision and can be either
11//! explicit or implicit (i.e., lazy). While the conversion to the fixed-point approximation
12//! itself generally involves rounding, once the fixed-point representation is obtained,
13//! operations on it are exact. Therefore, the fixed-point representation can be used for
14//! entropy coding.
15//!
16//! # Module Overview
17//!
18//! This module declares the base trait [`EntropyModel`] and its subtraits [`EncoderModel`]
19//! and [`DecoderModel`], which specify the interfaces that entropy models provide and that
20//! entropy coders in the sister modules can rely on.
21//!
22//! In addition, this module provides the following three utilities for constructing entropy
23//! models:
24//! - an adapter that converts parameterized discrete distributions (e.g., [`Binomial`]) or
25//! one-dimensional continuous probability distributions (e.g. [`Gaussian`]) from a
26//! representation in terms of float-valued functions to an (implicit) exactly invertible
27//! fixed-point representation; when provided with a continuous distribution (a
28//! probability density) then this adapter also quantizes the data space into bins. See
29//! [`DefaultLeakyQuantizer`] and [`SmallLeakyQuantizer`];
30//! - types for representing arbitrary categorical distributions in an explicit fixed-point
31//! representation; these types are intended either as fallbacks for probability
32//! distributions that lack an efficiently evaluable analytic expression of the cumulative
33//! distribution function (and that therefore can't be handled by the above adaptor), or
34//! for efficient *encoding* of i.i.d. symbols by precalculating and tabularizing the
35//! fixed-point representation of each allowed symbol. See [`DefaultLeakyQuantizer`],
36//! [`DefaultContiguousCategoricalEntropyModel`],
37//! [`DefaultNonContiguousCategoricalEncoderModel`], and
38//! [`DefaultNonContiguousCategoricalDecoderModel`] (and their respective counterparts
39//! with the "Small" instead of "Default" preset); and
40//! - types for high-performance "lookup tables" that enable efficient
41//! *decoding* of i.i.d. data; these types build up a lookup table with `2^PRECISION`
42//! entries (one entry per
43//! possible *quantile*) and are therefore only recommended to be used with relatively
44//! small `PRECISION`. See [`ContiguousLookupDecoderModel`] and
45//! [`NonContiguousLookupDecoderModel`].
46//!
47//! # Examples
48//!
49//! See [`LeakyQuantizer`](LeakyQuantizer#examples), [`ContiguousCategoricalEntropyModel`],
50//! [`NonContiguousCategoricalEncoderModel`]. [`NonContiguousCategoricalDecoderModel`], and
51//! [`ContiguousLookupDecoderModel`] or [`NonContiguousLookupDecoderModel`].
52//!
53//! TODO: direct links to "Examples" sections.
54//!
55//! # Motivation
56//!
57//! The general idea of entropy coding to find an optimal compression strategy by using a
58//! *probabilistic model of the data source*. Ideally, all conceivable data points would be
59//! compressed into a short bit string. However, short bit strings are a scarce commodity:
60//! for any integer `N`, there are only `2^N - 1` distinct bit strings that are shorter than
61//! `N` bits. For this reason, entropy coding algorithms assign the scarce short bit strings
62//! to data points that are most probable to appear in practice, while assigning longer bit
63//! strings to data points that may be possible in principle but that are extremely
64//! improbable in practice. More precisely, entropy coding aims to minimize the expected bit
65//! rate under a probabilistic model of the data source. We refer to this model as an
66//! "entropy model".
67//!
68//! In contrast to many other use cases of probabilistic models in computing, entropy models
69//! must be amenable to *exact* arithmetic operations. In particular, no rounding errors are
70//! allowed when inverting the cumulative distribution function. Even a single arbitrarily
71//! small rounding error could set off a chain reaction leading to arbitrarily large and
72//! arbitrarily many errors when compressing and then decompressing a sequence of symbols
73//! (see, e.g., the [motivating example for the `ChainCoder`](super::chain#motivation)).
74//! This module provides utilities for defining entropy models that can be inverted exactly
75//! without any rounding errors.
76//!
77//! # Zero Probability
78//!
79//! All entropy models provided in this module have a predictable support, i.e., it is
80//! always easy to predict exactly which symbols have nonzero probability under the model.
81//! This is an important property for entropy coding because trying to encode a symbol that
82//! has zero probability under the used entropy model would fail.
83//!
84//! When constructing an entropy model then the caller always has to provide a support
85//! (either as an integer range or as a list of symbols of arbitrary type). All entropy
86//! models in this module enforce the following constraints:
87//!
88//! 1. all symbols within the user-provided support are assigned at least the smallest
89//! nonzero probability that is representable at the used fixed-point `PRECISION` (even
90//! if naive rounding would lead to zero probability);
91//! 2. all symbols that are not in the user-provided support have probability zero;
92//! 3. the probabilities add up to one (this even holds when, e.g., quantizing a continuous
93//! probability distribution on a finite support that is smaller than the continuous
94//! distribution's possibly unbounded support); and
95//! 4. no single symbol has probability one, i.e., we disallow degenerate entropy models
96//! that put all probability mass on a single symbol, as such models can lead to problems
97//! in some entropy coders (if you don't know whether you may encounter degenerate
98//! entropy models for some symbols, just check for degeneracy and encode nothing in that
99//! case since the corresponding symbols can be trivially reconstructed).
100//!
101//! When entropy models are constructed from a floating-point representation of some
102//! probability distribution then rounding is done in such a way that the above constraints
103//! are satisfied. When entropy models are constructed by passing in probabilities that are
104//! already in fixed-point representation, then the constructor verifies the above
105//! constraints in an efficient way.
106//!
107//! While constraints (1) and (4) above are strictly enforced (for types defined in this
108//! module), constraints (2) and (3) hold in practice but must not be relied on for memory
109//! safety as they can technically be violated without the use of `unsafe` (by using a
110//! [`LeakyQuantizer`] with an invalid [`Distribution`], i.e., one whose cumulative
111//! distribution function either isn't monotonic or has an image that exceeds the interval
112//! `[0, 1]`).
113//!
114//! [`stack`]: super::stack
115//! [`queue`]: super::queue
116//! [`Binomial`]: probability::distribution::Binomial
117//! [`Gaussian`]: probability::distribution::Gaussian
118
119/// Re-export of [`probability::distribution::Distribution`].
120///
121/// Most users will never have to interact with this trait directly. When a method requires
122/// a type that implements `Distribution`, most users will likely use a predefined type from
123/// the [`probability`] crate. You only need to implement this trait if you want to use a
124/// probability distribution that is not (yet) provided by the `probability` crate.
125///
126/// # See Also
127///
128/// - [`Inverse`]
129///
130/// [`probability::distribution::Distribution`]:
131/// https://docs.rs/probability/latest/probability/distribution/trait.Distribution.html
132/// [`probability`]: https://docs.rs/probability/latest/probability/
133pub use probability::distribution::Distribution;
134
135/// Re-export of [`probability::distribution::Inverse`].
136///
137/// Most users will never have to interact with this trait directly. When a method requires
138/// a type that implements `Inverse`, most users will likely use a predefined type from
139/// the [`probability`] crate. You only need to implement this trait if you want to use a
140/// probability distribution that is not (yet) provided by the `probability` crate.
141///
142/// # See Also
143///
144/// - [`Distribution`]
145///
146/// [`probability::distribution::Inverse`]:
147/// https://docs.rs/probability/latest/probability/distribution/trait.Inverse.html
148/// [`probability`]: https://docs.rs/probability/latest/probability/
149pub use probability::distribution::Inverse;
150
151mod categorical;
152mod quantize;
153mod uniform;
154
155use core::{borrow::Borrow, hash::Hash};
156
157use alloc::{boxed::Box, vec::Vec};
158
159use num_traits::{float::FloatCore, AsPrimitive, One, Zero};
160
161use crate::{BitArray, NonZeroBitArray};
162
163/// Base trait for probabilistic models of a data source.
164///
165/// All entropy models (see [module level documentation](self)) that can be used for
166/// encoding and/or decoding with stream codes must implement this trait and at least one of
167/// [`EncoderModel`] and/or [`DecoderModel`]. This trait exposes the type of [`Symbol`]s
168/// over which the entropy model is defined, the type that is used to represent a
169/// [`Probability`] in fixed-point arithmetic, and the fixed point `PRECISION` (see
170/// [discussion of type parameters](super#type-parameters-of-entropy-models)).
171///
172/// # Blanket Implementation for `&impl EntropyModel`
173///
174/// We provide the following blanket implementation for references to `EntropyModel`s:
175///
176/// ```ignore
177/// impl<M, const PRECISION: usize> EntropyModel<PRECISION> for &M
178/// where
179/// M: EntropyModel<PRECISION> + ?Sized
180/// { ... }
181/// ```
182///
183/// This means that, if some type `M` implements `EntropyModel<PRECISION>` for some
184/// `PRECISION`, then so does the reference type `&M`. Analogous blanket implementations are
185/// provided for the traits [`EncoderModel`], [`DecoderModel`], and
186/// [`IterableEntropyModel`]. The implementations simply delegate all calls to `M` (which is
187/// possible since all methods only take an `&self` receiver). Therefore:
188/// - you don't need to (and, in fact, currently can't) implement `EntropyModel`,
189/// `EncoderModel`, or `DecoderModel` for reference types `&M`; just implement these
190/// traits for "value types" `M` and you'll get the implementation for the corresponding
191/// reference types for free.
192/// - when you write a function or method that takes a generic entropy model as an argument,
193/// always take the entropy model (formally) *by value* (i.e., declare your function as
194/// `fn f(model: impl EntropyModel<PRECISION>)` or as `f<M:
195/// EntropyModel<PRECISION>>(model: M)`). Since all references to `EntropyModel`s are also
196/// `EntropyModel`s themselves, a function with one of these signatures can be called with
197/// an entropy model passed in either by value or by reference. If your function or method
198/// needs to pass out several copies of `model` then add an extra bound `M: Copy` (see,
199/// e.g., [`Encode::encode_iid_symbols`](super::Encode::encode_iid_symbols)). This will
200/// allow users to call your function either with a reference to an entropy model (all
201/// shared references implement `Copy`), or with some cheaply copyable entropy model such
202/// as a view to a lookup model (see [`ContiguousLookupDecoderModel::as_view`] or
203/// [`NonContiguousLookupDecoderModel::as_view`]).
204///
205/// # See Also
206///
207/// - [`EncoderModel`]
208/// - [`DecoderModel`]
209///
210/// [`Symbol`]: Self::Symbol
211/// [`Probability`]: Self::Probability
212pub trait EntropyModel<const PRECISION: usize> {
213 /// The type of data over which the entropy model is defined.
214 ///
215 /// This is the type of an item of the *uncompressed* data.
216 ///
217 /// Note that, although any given `EntropyModel` has a fixed associated `Symbol` type,
218 /// this doesn't prevent you from encoding heterogeneous sequences of symbols where each
219 /// symbol has a different type. You can use a different `EntropyModel` with a different
220 /// associated `Symbol` type for each symbol.
221 type Symbol;
222
223 /// The type used to represent probabilities, cumulatives, and quantiles.
224 ///
225 /// This is a primitive unsigned integer type that must hold at least `PRECISION` bits.
226 /// An integer value `p: Probability` semantically represents the probability,
227 /// cumulative, or quantile `p * 2.0^(-PRECISION)` (where `^` denotes exponentiation and
228 /// `PRECISION` is a const generic parameter of the trait `EntropyModel`).
229 ///
230 /// In many places where `constriction`'s public API *returns* probabilities, they have
231 /// already been verified to be nonzero. In such a case, the probability is returned as
232 /// a `Probability::NonZero`, which denotes the corresponding non-zeroable type (e.g.,
233 /// if `Probability` is `u32` then `Probability::NonZero` is
234 /// [`NonZeroU32`](core::num::NonZeroU32)). The "bare" `Probability` type is mostly used
235 /// for left-cumulatives and quantiles (i.e., for points on the y-axis in the graph of a
236 /// cumulative distribution function).
237 ///
238 /// # Enforcing the Constraints
239 ///
240 /// Implementations of `EntropyModel` are encouraged to enforce the constraint
241 /// `1 <= PRECISION <= Probability::BITS`. The simplest way to do so is by stating it as an
242 /// assertion `assert!(1 <= PRECISION && PRECISION <= Probability::BITS)` at the beginning of
243 /// relevant methods. This assertion has zero runtime cost because it can be
244 /// trivially evaluated at compile time and therefore will be optimized out if it holds.
245 /// As of `constriction` 0.4, implementations provided by `constriction` include a similar
246 /// assertion that is checked at compile time using const evaluation tricks.
247 ///
248 /// # (Internal) Representation of Probability One
249 ///
250 /// The case of "probability one" is treated specially. This case does not come up in
251 /// the public API since we disallow probability one for any individual symbol under any
252 /// entropy model, and since all left-sided cumulatives always correspond to a symbol
253 /// with nonzero probability. But the value "one" still comes up internally as the
254 /// right-cumulative of the last allowed symbol for any model. Although our treatment of
255 /// "probability one" can thus be considered an implementation detail, it is likely to
256 /// become an issue in third-party implementations of `EntropyModel`, so it is worth
257 /// documenting our recommended treatment.
258 ///
259 /// We internally represent "probability one" by its normal fixed-point representation
260 /// of `p = 1 << PRECISION` (i.e., `p = 2^PRECISION` in mathematical notation) if this
261 /// value fits into `Probability`, i.e., if `PRECISION != Probability::BITS`. In the
262 /// (uncommon) case where `PRECISION == Probability::BITS`, we represent "probability
263 /// one" as the integer zero (i.e., cutting off the overflowing bit). This means that
264 /// any probability that is guaranteed to not be one can always be calculated by
265 /// subtracting its left-sided cumulative from its right-sided cumulative in wrapping
266 /// arithmetic. However, this convention means that one has to be careful not to confuse
267 /// probability zero with probabilty one. In our implementations, these two possible
268 /// interpretations of the integer `p = 0` always turned out to be easy to disambiguate
269 /// statically.
270 type Probability: BitArray;
271}
272
273/// A trait for [`EntropyModel`]s that can be used for encoding (compressing) data.
274///
275/// As discussed in the [module level documentation](self), all stream codes in
276/// `constriction` use so-called [`EntropyModel`]s for encoding and/or decoding data. Some
277/// of these `EntropyModel`s may be used only for encoding, only for decoding, or for both,
278/// depending on their internal representation.
279///
280/// This `EncoderModel` trait is implemented for all entropy models that can be used for
281/// *encoding* data. To encode data with an `EncoderModel`, construct an entropy coder that
282/// implements the [`Encode`] trait and pass the data and the entropy model to one of the
283/// methods of the [`Encode`] trait (or to an inherent method of the entropy coder, such as
284/// [`AnsCoder::encode_symbols_reverse`]).
285///
286/// # Blanket Implementation for `&impl EncoderModel`
287///
288/// We provide the following blanket implementation for references to `EncoderModel`s:
289///
290/// ```ignore
291/// impl<M, const PRECISION: usize> EncoderModel<PRECISION> for &M
292/// where
293/// M: EncoderModel<PRECISION> + ?Sized
294/// { ... }
295/// ```
296///
297/// This means that, if some type `M` implements `EncoderModel<PRECISION>` for some
298/// `PRECISION`, then so does the reference type `&M`. Therefore, generic functions or
299/// methods should never take a generic `EncoderModel` by reference. They should always take
300/// the generic `EncoderModel` *by value* because this also covers the case of references
301/// but is strictly more general. If your generic function needs to be able to cheaply copy
302/// the `EncoderModel` (as it could with a shared reference) then it should still take the
303/// generic `EncoderModel` formally by value and just add an additional `Copy` bound (see,
304/// e.g., the method signature of [`Encode::encode_iid_symbols`]. For a more elaborate
305/// explanation, please refer to the discussion of the analogous blanket implementation for
306/// [`EntropyModel`].
307///
308/// # See Also
309///
310/// - base trait: [`EntropyModel`]
311/// - sister trait: [`DecoderModel`]
312/// - used with: [`Encode`]
313///
314/// [`Encode`]: super::Encode
315/// [`AnsCoder::encode_symbols_reverse`]: super::stack::AnsCoder::encode_symbols_reverse
316/// [`Encode::encode_iid_symbols`]: super::Encode::encode_iid_symbols
317pub trait EncoderModel<const PRECISION: usize>: EntropyModel<PRECISION> {
318 /// Looks up a symbol in the entropy model.
319 ///
320 /// Takes a `symbol` either by value or by reference and looks it up in the entropy
321 /// model.
322 /// - If `symbol` has a nonzero probability under the model, then this method returns
323 /// `Some((left_sided_cumulative, probability))`, where `probability` is the
324 /// probability in fixed-point representation (see
325 /// [discussion](EntropyModel::Probability)) and `left_sided_cumulative` is the sum of
326 /// the probabilities of all symbols up to but not including `symbol` (also in
327 /// fixed-point representation). Both `left_sided_cumulative` and `probability` are
328 /// guaranteed to be strictly smaller than `1 << PRECISION` (which would semantically
329 /// represent "probability one") because `probability` is nonzero and because we don't
330 /// support degenerate entropy models that put all probability mass on a single
331 /// symbol.
332 /// - If `symbol` has zero probability under the model, then this method returns `None`.
333 fn left_cumulative_and_probability(
334 &self,
335 symbol: impl Borrow<Self::Symbol>,
336 ) -> Option<(Self::Probability, <Self::Probability as BitArray>::NonZero)>;
337
338 /// Returns the probability of the given symbol in floating point representation.
339 ///
340 /// The trait bound `Self::Probability: Into<F>` guarantees that no rounding occurs in
341 /// the conversion. You may have to specify the return type explicitly using "turbofish"
342 /// notation `::<f64>(...)` or `::<f32>(...)`, see example below.
343 ///
344 /// Returns `0.0` if `symbol` is not in the support of the entropy model.
345 ///
346 /// This method is provided mainly as a convenience for debugging.
347 ///
348 /// # Example
349 ///
350 /// ```
351 /// use constriction::stream::model::{EncoderModel, DefaultNonContiguousCategoricalEncoderModel};
352 ///
353 /// let symbols = vec!['a', 'b', 'c', 'd'];
354 /// let probabilities = vec![1u32 << 21, 1 << 23, 1 << 22, 1 << 21];
355 /// let model = DefaultNonContiguousCategoricalEncoderModel // "Default" uses `PRECISION = 24`
356 /// ::from_symbols_and_nonzero_fixed_point_probabilities(
357 /// symbols.iter().copied(), probabilities.iter().copied(), false)
358 /// .unwrap();
359 ///
360 /// assert_eq!(model.floating_point_probability::<f64>('a'), 0.125);
361 /// assert_eq!(model.floating_point_probability::<f64>('b'), 0.5);
362 /// assert_eq!(model.floating_point_probability::<f64>('c'), 0.25);
363 /// assert_eq!(model.floating_point_probability::<f64>('d'), 0.125);
364 /// assert_eq!(model.floating_point_probability::<f64>('x'), 0.0);
365 /// ```
366 ///
367 /// [`fixed_point_probabilities`]: #method.fixed_point_probabilities
368 /// [`floating_point_probabilities_lossy`]: #method.floating_point_probabilities_lossy
369 /// [`from_floating_point_probabilities`]: #method.from_floating_point_probabilities
370 /// [`from_nonzero_fixed_point_probabilities`]:
371 /// #method.from_nonzero_fixed_point_probabilities
372 #[inline]
373 fn floating_point_probability<F>(&self, symbol: Self::Symbol) -> F
374 where
375 F: FloatCore,
376 Self::Probability: Into<F>,
377 {
378 // This gets compiled to a single floating point multiplication rather than a (slow)
379 // division.
380 let whole = (F::one() + F::one()) * (Self::Probability::one() << (PRECISION - 1)).into();
381 let probability = self
382 .left_cumulative_and_probability(symbol)
383 .map_or(Self::Probability::zero(), |(_, p)| p.get());
384 probability.into() / whole
385 }
386}
387
388/// A trait for [`EntropyModel`]s that can be used for decoding (decompressing) data.
389///
390/// As discussed in the [module level documentation](self), all stream codes in
391/// `constriction` use so-called [`EntropyModel`]s for encoding and/or decoding data. Some
392/// of these `EntropyModel`s may be used only for encoding, only for decoding, or for both,
393/// depending on their internal representation.
394///
395/// This `DecoderModel` trait is implemented for all entropy models that can be used for
396/// *decoding* data. To decode data with a `DecoderModel`, construct an entropy coder that
397/// implements the [`Decode`] trait and pass the entropy model to one of the methods of the
398/// [`Decode`] trait.
399///
400/// # Blanket Implementation for `&impl DecoderModel`
401///
402/// We provide the following blanket implementation for references to `DecoderModel`s:
403///
404/// ```ignore
405/// impl<M, const PRECISION: usize> DecoderModel<PRECISION> for &M
406/// where
407/// M: DecoderModel<PRECISION> + ?Sized
408/// { ... }
409/// ```
410///
411/// This means that, if some type `M` implements `DecoderModel<PRECISION>` for some
412/// `PRECISION`, then so does the reference type `&M`. Therefore, generic functions or
413/// methods should never take a generic `DecoderModel` by reference. They should always take
414/// the generic `DecoderModel` *by value* because this also covers the case of references
415/// but is strictly more general. If your generic function needs to be able to cheaply copy
416/// the `DecoderModel` (as it could with a shared reference) then it should still take the
417/// generic `DecoderModel` formally by value and just add an additional `Copy` bound (see,
418/// e.g., the method signature of [`Decode::decode_iid_symbols`]. For a more elaborate
419/// explanation, please refer to the discussion of the analogous blanket implementation for
420/// [`EntropyModel`].
421///
422/// # See Also
423///
424/// - base trait: [`EntropyModel`]
425/// - sister trait: [`EncoderModel`]
426/// - used with: [`Decode`]
427///
428/// [`Decode`]: super::Decode
429/// [`Decode::decode_iid_symbols`]: super::Encode::encode_iid_symbols
430pub trait DecoderModel<const PRECISION: usize>: EntropyModel<PRECISION> {
431 /// Looks up the symbol for a given quantile.
432 ///
433 /// The argument `quantile` represents a number in the half-open interval `[0, 1)` in
434 /// fixed-point arithmetic, i.e., it must be strictly smaller than `1 << PRECISION`.
435 /// Think of `quantile` as a point on the vertical axis of a plot of the cumulative
436 /// distribution function of the probability model. This method evaluates the inverse of
437 /// the cumulative distribution function, which is sometimes called the *quantile
438 /// function*.
439 ///
440 /// Returns a tuple `(symbol, left_sided_cumulative, probability)` where `probability`
441 /// is the probability of `symbol` under the entropy model (in fixed-point arithmetic)
442 /// and `left_sided_cumulative` is the sum of the probabilities of all symbols up to and
443 /// not including `symbol`. The returned `symbol` is the unique symbol that satisfies
444 /// `left_sided_cumulative <= quantile < left_sided_cumulative + probability` (where the
445 /// addition on the right-hand side is non-wrapping).
446 ///
447 /// Note that, in contrast to [`EncoderModel::left_cumulative_and_probability`], this
448 /// method does *not* return an `Option`. This is because, as long as `quantile < 1 <<
449 /// PRECISION`, a valid probability distribution always has a symbol for which the range
450 /// `left_sided_cumulative..(left_sided_cumulative + quantile)` contains `quantile`, and
451 /// the probability of this symbol is guaranteed to be nonzero because the probability
452 /// is the size of the range, which contains at least the one element `quantile`.
453 ///
454 /// # Panics
455 ///
456 /// Implementations may panic if `quantile >= 1 << PRECISION`.
457 fn quantile_function(
458 &self,
459 quantile: Self::Probability,
460 ) -> (
461 Self::Symbol,
462 Self::Probability,
463 <Self::Probability as BitArray>::NonZero,
464 );
465}
466
467/// A trait for [`EntropyModel`]s that can be serialized into a common format.
468///
469/// The method [`symbol_table`] iterates over all symbols with nonzero probability under the
470/// entropy. The iteration occurs in uniquely defined order of increasing left-sided
471/// cumulative probability distribution of the symbols. All `EntropyModel`s for which such
472/// iteration can be implemented efficiently should implement this trait. `EntropyModel`s
473/// for which such iteration would require extra work (e.g., sorting symbols by left-sided
474/// cumulative distribution) should *not* implement this trait so that callers can assume
475/// that calling `symbol_table` is cheap.
476///
477/// The main advantage of implementing this trait is that it provides default
478/// implementations of conversions to various other `EncoderModel`s and `DecoderModel`s, see
479/// [`to_generic_encoder_model`], [`to_generic_decoder_model`], and
480/// [`to_generic_lookup_decoder_model`].
481///
482/// [`symbol_table`]: Self::symbol_table
483/// [`to_generic_encoder_model`]: Self::to_generic_encoder_model
484/// [`to_generic_decoder_model`]: Self::to_generic_decoder_model
485/// [`to_generic_lookup_decoder_model`]: Self::to_generic_lookup_decoder_model
486pub trait IterableEntropyModel<'m, const PRECISION: usize>: EntropyModel<PRECISION> {
487 /// Iterates over all symbols in the unique order that is consistent with the cumulative
488 /// distribution.
489 ///
490 /// The iterator iterates in order of increasing cumulative.
491 ///
492 /// This method may be used, e.g., to export the model into a serializable format. It is
493 /// also used internally by constructors that create a different but equivalent
494 /// representation of the same entropy model (e.g., to construct a
495 /// [`ContiguousLookupDecoderModel`] or [`NonContiguousLookupDecoderModel`] from some `EncoderModel`).
496 ///
497 /// # Example
498 ///
499 /// ```
500 /// use constriction::stream::model::{
501 /// IterableEntropyModel, SmallNonContiguousCategoricalDecoderModel
502 /// };
503 ///
504 /// let symbols = vec!['a', 'b', 'x', 'y'];
505 /// let probabilities = vec![0.125, 0.5, 0.25, 0.125]; // Can all be represented without rounding.
506 /// let model = SmallNonContiguousCategoricalDecoderModel
507 /// ::from_symbols_and_floating_point_probabilities_fast(
508 /// symbols.iter().cloned(),
509 /// &probabilities,
510 /// None
511 /// ).unwrap();
512 ///
513 /// // Print a table representation of this entropy model (e.g., for debugging).
514 /// dbg!(model.symbol_table().collect::<Vec<_>>());
515 ///
516 /// // Create a lookup model. This method is provided by the trait `IterableEntropyModel`.
517 /// let lookup_decoder_model = model.to_generic_lookup_decoder_model();
518 /// ```
519 ///
520 /// # See also
521 ///
522 /// - [`floating_point_symbol_table`](Self::floating_point_symbol_table)
523 fn symbol_table(
524 &'m self,
525 ) -> impl Iterator<
526 Item = (
527 Self::Symbol,
528 Self::Probability,
529 <Self::Probability as BitArray>::NonZero,
530 ),
531 >;
532
533 /// Similar to [`symbol_table`], but yields both cumulatives and probabilities in
534 /// floating point representation.
535 ///
536 /// The conversion to floats is guaranteed to be lossless due to the trait bound `F:
537 /// From<Self::Probability>`.
538 ///
539 /// [`symbol_table`]: Self::symbol_table
540 ///
541 /// TODO: test
542 fn floating_point_symbol_table<F>(&'m self) -> impl Iterator<Item = (Self::Symbol, F, F)>
543 where
544 F: FloatCore + From<Self::Probability> + 'm,
545 Self::Probability: Into<F>,
546 {
547 // This gets compiled into a constant, and the divisions by `whole` get compiled
548 // into floating point multiplications rather than (slower) divisions.
549 let whole = (F::one() + F::one()) * (Self::Probability::one() << (PRECISION - 1)).into();
550
551 self.symbol_table()
552 .map(move |(symbol, cumulative, probability)| {
553 (
554 symbol,
555 cumulative.into() / whole,
556 probability.get().into() / whole,
557 )
558 })
559 }
560
561 /// Returns the entropy in units of bits (i.e., base 2).
562 ///
563 /// The entropy is the expected amortized bit rate per symbol of an optimal lossless
564 /// entropy coder, assuming that the data is indeed distributed according to the model.
565 ///
566 /// Note that calling this method on a [`LeakilyQuantizedDistribution`] will return the
567 /// entropy *after quantization*, not the differential entropy of the underlying
568 /// continuous probability distribution.
569 ///
570 /// # See also
571 ///
572 /// - [`cross_entropy_base2`](Self::cross_entropy_base2)
573 /// - [`reverse_cross_entropy_base2`](Self::reverse_cross_entropy_base2)
574 /// - [`kl_divergence_base2`](Self::kl_divergence_base2)
575 /// - [`reverse_kl_divergence_base2`](Self::reverse_kl_divergence_base2)
576 fn entropy_base2<F>(&'m self) -> F
577 where
578 F: num_traits::Float + core::iter::Sum,
579 Self::Probability: Into<F>,
580 {
581 let scaled_shifted = self
582 .symbol_table()
583 .map(|(_, _, probability)| {
584 let probability = probability.get().into();
585 probability * probability.log2() // probability is guaranteed to be nonzero.
586 })
587 .sum::<F>();
588
589 let whole = (F::one() + F::one()) * (Self::Probability::one() << (PRECISION - 1)).into();
590 F::from(PRECISION).unwrap() - scaled_shifted / whole
591 }
592
593 /// Returns the cross entropy between argument `p` and this model in units of bits
594 /// (i.e., base 2).
595 ///
596 /// This is the expected amortized bit rate per symbol that an optimal coder will
597 /// achieve when using this model on a data source that draws symbols from the provided
598 /// probability distribution `p`.
599 ///
600 /// The cross entropy is defined as `H(p, self) = - sum_i p[i] * log2(self[i])` where
601 /// `p` is provided as an argument and `self[i]` denotes the corresponding probabilities
602 /// of the model. Note that `self[i]` is never zero for models in the `constriction`
603 /// library, so the logarithm in the (forward) cross entropy can never be infinite.
604 ///
605 /// The argument `p` must yield a sequence of probabilities (nonnegative values that sum
606 /// to 1) with the correct length and order to be compatible with the model.
607 ///
608 /// # See also
609 ///
610 /// - [`entropy_base2`](Self::entropy_base2)
611 /// - [`reverse_cross_entropy_base2`](Self::reverse_cross_entropy_base2)
612 /// - [`kl_divergence_base2`](Self::kl_divergence_base2)
613 /// - [`reverse_kl_divergence_base2`](Self::reverse_kl_divergence_base2)
614 fn cross_entropy_base2<F>(&'m self, p: impl IntoIterator<Item = F>) -> F
615 where
616 F: num_traits::Float + core::iter::Sum,
617 Self::Probability: Into<F>,
618 {
619 let shift = F::from(PRECISION).unwrap();
620 self.symbol_table()
621 .zip(p)
622 .map(|((_, _, probability), p)| {
623 let probability = probability.get().into();
624 // Perform the shift for each item individually so that the result is
625 // reasonable even if `p` is not normalized.
626 p * (shift - probability.log2()) // probability is guaranteed to be nonzero.
627 })
628 .sum::<F>()
629 }
630
631 /// Returns the cross entropy between this model and argument `p` in units of bits
632 /// (i.e., base 2).
633 ///
634 /// This method is provided mostly for completeness. You're more likely to want to
635 /// calculate [`cross_entropy_base2`](Self::cross_entropy_base2).
636 ///
637 /// The reverse cross entropy is defined as `H(self, p) = - sum_i self[i] * log2(p[i])`
638 /// where `p` is provided as an argument and `self[i]` denotes the corresponding
639 /// probabilities of the model.
640 ///
641 /// The argument `p` must yield a sequence of *nonzero* probabilities (that sum to 1)
642 /// with the correct length and order to be compatible with the model.
643 ///
644 /// # See also
645 ///
646 /// - [`cross_entropy_base2`](Self::cross_entropy_base2)
647 /// - [`entropy_base2`](Self::entropy_base2)
648 /// - [`reverse_kl_divergence_base2`](Self::reverse_kl_divergence_base2)
649 /// - [`kl_divergence_base2`](Self::kl_divergence_base2)
650 fn reverse_cross_entropy_base2<F>(&'m self, p: impl IntoIterator<Item = F>) -> F
651 where
652 F: num_traits::Float + core::iter::Sum,
653 Self::Probability: Into<F>,
654 {
655 let scaled = self
656 .symbol_table()
657 .zip(p)
658 .map(|((_, _, probability), p)| {
659 let probability = probability.get().into();
660 probability * p.log2()
661 })
662 .sum::<F>();
663
664 let whole = (F::one() + F::one()) * (Self::Probability::one() << (PRECISION - 1)).into();
665 -scaled / whole
666 }
667
668 /// Returns Kullback-Leibler divergence `D_KL(p || self)`
669 ///
670 /// This is the expected *overhead* (due to model quantization) in bit rate per symbol
671 /// that an optimal coder will incur when using this model on a data source that draws
672 /// symbols from the provided probability distribution `p` (which this model is supposed
673 /// to approximate).
674 ///
675 /// The KL-divergence is defined as `D_KL(p || self) = - sum_i p[i] * log2(self[i] /
676 /// p[i])`, where `p` is provided as an argument and `self[i]` denotes the corresponding
677 /// probabilities of the model. Any term in the sum where `p[i]` is *exactly* zero does
678 /// not contribute (regardless of whether or not `self[i]` would also be zero).
679 ///
680 /// The argument `p` must yield a sequence of probabilities (nonnegative values that sum
681 /// to 1) with the correct length and order to be compatible with the model.
682 ///
683 /// # See also
684 ///
685 /// - [`reverse_kl_divergence_base2`](Self::reverse_kl_divergence_base2)
686 /// - [`entropy_base2`](Self::entropy_base2)
687 /// - [`cross_entropy_base2`](Self::cross_entropy_base2)
688 /// - [`reverse_cross_entropy_base2`](Self::reverse_cross_entropy_base2)
689 fn kl_divergence_base2<F>(&'m self, p: impl IntoIterator<Item = F>) -> F
690 where
691 F: num_traits::Float + core::iter::Sum,
692 Self::Probability: Into<F>,
693 {
694 let shifted = self
695 .symbol_table()
696 .zip(p)
697 .map(|((_, _, probability), p)| {
698 if p == F::zero() {
699 F::zero()
700 } else {
701 let probability = probability.get().into();
702 p * (p.log2() - probability.log2())
703 }
704 })
705 .sum::<F>();
706
707 shifted + F::from(PRECISION).unwrap() // assumes that `p` is normalized
708 }
709
710 /// Returns reverse Kullback-Leibler divergence, i.e., `D_KL(self || p)`
711 ///
712 /// This method is provided mostly for completeness. You're more likely to want to
713 /// calculate [`kl_divergence_base2`](Self::kl_divergence_base2).
714 ///
715 /// The reverse KL-divergence is defined as `D_KL(self || p) = - sum_i self[i] *
716 /// log2(p[i] / self[i])` where `p`
717 /// is provided as an argument and `self[i]` denotes the corresponding probabilities of
718 /// the model.
719 ///
720 /// The argument `p` must yield a sequence of *nonzero* probabilities (that sum to 1)
721 /// with the correct length and order to be compatible with the model.
722 ///
723 /// # See also
724 ///
725 /// - [`kl_divergence_base2`](Self::kl_divergence_base2)
726 /// - [`entropy_base2`](Self::entropy_base2)
727 /// - [`cross_entropy_base2`](Self::cross_entropy_base2)
728 /// - [`reverse_cross_entropy_base2`](Self::reverse_cross_entropy_base2)
729 fn reverse_kl_divergence_base2<F>(&'m self, p: impl IntoIterator<Item = F>) -> F
730 where
731 F: num_traits::Float + core::iter::Sum,
732 Self::Probability: Into<F>,
733 {
734 let scaled_shifted = self
735 .symbol_table()
736 .zip(p)
737 .map(|((_, _, probability), p)| {
738 let probability = probability.get().into();
739 probability * (probability.log2() - p.log2())
740 })
741 .sum::<F>();
742
743 let whole = (F::one() + F::one()) * (Self::Probability::one() << (PRECISION - 1)).into();
744 scaled_shifted / whole - F::from(PRECISION).unwrap()
745 }
746
747 /// Creates an [`EncoderModel`] from this `EntropyModel`
748 ///
749 /// This is a fallback method that should only be used if no more specialized
750 /// conversions are available. It generates a [`NonContiguousCategoricalEncoderModel`]
751 /// with the same probabilities and left-sided cumulatives as `self`. Note that a
752 /// `NonContiguousCategoricalEncoderModel` is very generic and therefore not
753 /// particularly optimized. Thus, before calling this method first check:
754 /// - if the original `Self` type already implements `EncoderModel` (some types
755 /// implement *both* `EncoderModel` and `DecoderModel`); or
756 /// - if the `Self` type has some inherent method with a name like `to_encoder_model`;
757 /// if it does, that method probably returns an implementation of `EncoderModel` that
758 /// is better optimized for your use case.
759 #[inline(always)]
760 fn to_generic_encoder_model(
761 &'m self,
762 ) -> NonContiguousCategoricalEncoderModel<Self::Symbol, Self::Probability, PRECISION>
763 where
764 Self::Symbol: Hash + Eq,
765 {
766 self.into()
767 }
768
769 /// Creates a [`DecoderModel`] from this `EntropyModel`
770 ///
771 /// This is a fallback method that should only be used if no more specialized
772 /// conversions are available. It generates a [`NonContiguousCategoricalDecoderModel`]
773 /// with the same probabilities and left-sided cumulatives as `self`. Note that a
774 /// `NonContiguousCategoricalEncoderModel` is very generic and therefore not
775 /// particularly optimized. Thus, before calling this method first check:
776 /// - if the original `Self` type already implements `DecoderModel` (some types
777 /// implement *both* `EncoderModel` and `DecoderModel`); or
778 /// - if the `Self` type has some inherent method with a name like `to_decoder_model`;
779 /// if it does, that method probably returns an implementation of `DecoderModel` that
780 /// is better optimized for your use case.
781 #[inline(always)]
782 fn to_generic_decoder_model(
783 &'m self,
784 ) -> NonContiguousCategoricalDecoderModel<
785 Self::Symbol,
786 Self::Probability,
787 Vec<(Self::Probability, Self::Symbol)>,
788 PRECISION,
789 >
790 where
791 Self::Symbol: Clone,
792 {
793 self.into()
794 }
795
796 /// Creates a [`DecoderModel`] from this `EntropyModel`
797 ///
798 /// This is a fallback method that should only be used if no more specialized
799 /// conversions are available. It generates a [`ContiguousLookupDecoderModel`] or [`NonContiguousLookupDecoderModel`] that makes no
800 /// assumption about contiguity of the support. Thus, before calling this method first
801 /// check if the `Self` type has some inherent method with a name like
802 /// `to_lookup_decoder_model`. If it does, that method probably returns a
803 /// `LookupDecoderModel` that is better optimized for your use case.
804 #[inline(always)]
805 fn to_generic_lookup_decoder_model(
806 &'m self,
807 ) -> NonContiguousLookupDecoderModel<
808 Self::Symbol,
809 Self::Probability,
810 Vec<(Self::Probability, Self::Symbol)>,
811 Box<[Self::Probability]>,
812 PRECISION,
813 >
814 where
815 Self::Probability: Into<usize>,
816 usize: AsPrimitive<Self::Probability>,
817 Self::Symbol: Clone + Default,
818 {
819 self.into()
820 }
821}
822
823impl<M, const PRECISION: usize> EntropyModel<PRECISION> for &M
824where
825 M: EntropyModel<PRECISION> + ?Sized,
826{
827 type Probability = M::Probability;
828 type Symbol = M::Symbol;
829}
830
831impl<M, const PRECISION: usize> EncoderModel<PRECISION> for &M
832where
833 M: EncoderModel<PRECISION> + ?Sized,
834{
835 #[inline(always)]
836 fn left_cumulative_and_probability(
837 &self,
838 symbol: impl Borrow<Self::Symbol>,
839 ) -> Option<(Self::Probability, <Self::Probability as BitArray>::NonZero)> {
840 (*self).left_cumulative_and_probability(symbol)
841 }
842}
843
844impl<M, const PRECISION: usize> DecoderModel<PRECISION> for &M
845where
846 M: DecoderModel<PRECISION> + ?Sized,
847{
848 #[inline(always)]
849 fn quantile_function(
850 &self,
851 quantile: Self::Probability,
852 ) -> (
853 Self::Symbol,
854 Self::Probability,
855 <Self::Probability as BitArray>::NonZero,
856 ) {
857 (*self).quantile_function(quantile)
858 }
859}
860
861impl<'m, M, const PRECISION: usize> IterableEntropyModel<'m, PRECISION> for &'m M
862where
863 M: IterableEntropyModel<'m, PRECISION>,
864{
865 fn symbol_table(
866 &'m self,
867 ) -> impl Iterator<
868 Item = (
869 Self::Symbol,
870 Self::Probability,
871 <Self::Probability as BitArray>::NonZero,
872 ),
873 > {
874 (*self).symbol_table()
875 }
876
877 fn entropy_base2<F>(&'m self) -> F
878 where
879 F: num_traits::Float + core::iter::Sum,
880 Self::Probability: Into<F>,
881 {
882 (*self).entropy_base2()
883 }
884
885 #[inline(always)]
886 fn to_generic_encoder_model(
887 &'m self,
888 ) -> NonContiguousCategoricalEncoderModel<Self::Symbol, Self::Probability, PRECISION>
889 where
890 Self::Symbol: Hash + Eq,
891 {
892 (*self).to_generic_encoder_model()
893 }
894
895 #[inline(always)]
896 fn to_generic_decoder_model(
897 &'m self,
898 ) -> NonContiguousCategoricalDecoderModel<
899 Self::Symbol,
900 Self::Probability,
901 Vec<(Self::Probability, Self::Symbol)>,
902 PRECISION,
903 >
904 where
905 Self::Symbol: Clone,
906 {
907 (*self).to_generic_decoder_model()
908 }
909}
910
911pub use categorical::{
912 contiguous::{
913 ContiguousCategoricalEntropyModel, DefaultContiguousCategoricalEntropyModel,
914 SmallContiguousCategoricalEntropyModel,
915 },
916 lazy_contiguous::{
917 DefaultLazyContiguousCategoricalEntropyModel, LazyContiguousCategoricalEntropyModel,
918 SmallLazyContiguousCategoricalEntropyModel,
919 },
920 lookup_contiguous::{ContiguousLookupDecoderModel, SmallContiguousLookupDecoderModel},
921 lookup_noncontiguous::{NonContiguousLookupDecoderModel, SmallNonContiguousLookupDecoderModel},
922 non_contiguous::{
923 DefaultNonContiguousCategoricalDecoderModel, DefaultNonContiguousCategoricalEncoderModel,
924 NonContiguousCategoricalDecoderModel, NonContiguousCategoricalEncoderModel,
925 SmallNonContiguousCategoricalDecoderModel, SmallNonContiguousCategoricalEncoderModel,
926 },
927};
928pub use quantize::{
929 DefaultLeakyQuantizer, LeakilyQuantizedDistribution, LeakyQuantizer, SmallLeakyQuantizer,
930};
931pub use uniform::{DefaultUniformModel, SmallUniformModel, UniformModel};
932
933#[cfg(test)]
934mod tests {
935 use probability::prelude::*;
936
937 use super::*;
938
939 #[test]
940 fn entropy() {
941 #[cfg(not(miri))]
942 let (support, std_devs, means) = (-1000..=1000, [100., 200., 300.], [-10., 2.3, 50.1]);
943
944 // We use different settings when testing on miri so that the test time stays reasonable.
945 #[cfg(miri)]
946 let (support, std_devs, means) = (-100..=100, [10., 20., 30.], [-1., 0.23, 5.01]);
947
948 let quantizer = LeakyQuantizer::<_, _, u32, 24>::new(support);
949 for &std_dev in &std_devs {
950 for &mean in &means {
951 let distribution = Gaussian::new(mean, std_dev);
952 let model = quantizer.quantize(distribution);
953 let entropy = model.entropy_base2::<f64>();
954 let expected_entropy = 2.047095585180641 + std_dev.log2();
955 assert!((entropy - expected_entropy).abs() < 0.01);
956 }
957 }
958 }
959
960 pub(super) fn test_entropy_model<'m, D, const PRECISION: usize>(
961 model: &'m D,
962 support: impl Clone + Iterator<Item = D::Symbol>,
963 ) where
964 D: IterableEntropyModel<'m, PRECISION>
965 + EncoderModel<PRECISION>
966 + DecoderModel<PRECISION>
967 + 'm,
968 D::Symbol: Copy + core::fmt::Debug + PartialEq,
969 D::Probability: Into<u64>,
970 u64: AsPrimitive<D::Probability>,
971 {
972 let mut sum = 0;
973 for symbol in support.clone() {
974 let (left_cumulative, prob) = model.left_cumulative_and_probability(symbol).unwrap();
975 assert_eq!(left_cumulative.into(), sum);
976 sum += prob.get().into();
977
978 let expected = (symbol, left_cumulative, prob);
979 assert_eq!(model.quantile_function(left_cumulative), expected);
980 assert_eq!(model.quantile_function((sum - 1).as_()), expected);
981 assert_eq!(
982 model.quantile_function((left_cumulative.into() + prob.get().into() / 2).as_()),
983 expected
984 );
985 }
986 assert_eq!(sum, 1 << PRECISION);
987
988 test_iterable_entropy_model(model, support);
989 }
990
991 pub(super) fn test_iterable_entropy_model<'m, M, const PRECISION: usize>(
992 model: &'m M,
993 support: impl Clone + Iterator<Item = M::Symbol>,
994 ) where
995 M: IterableEntropyModel<'m, PRECISION> + 'm,
996 M::Symbol: Copy + core::fmt::Debug + PartialEq,
997 M::Probability: Into<u64>,
998 u64: AsPrimitive<M::Probability>,
999 {
1000 let mut expected_cumulative = 0u64;
1001 let mut count = 0;
1002 for (expected_symbol, (symbol, left_sided_cumulative, probability)) in
1003 support.clone().zip(model.symbol_table())
1004 {
1005 assert_eq!(symbol, expected_symbol);
1006 assert_eq!(left_sided_cumulative.into(), expected_cumulative);
1007 expected_cumulative += probability.get().into();
1008 count += 1;
1009 }
1010 assert_eq!(count, support.size_hint().0);
1011 assert_eq!(expected_cumulative, 1 << PRECISION);
1012 }
1013
1014 /// Verifies that the model is close to a provided probability mass function (in
1015 /// KL-divergence).
1016 pub(super) fn verify_iterable_entropy_model<'m, M, P, const PRECISION: usize>(
1017 model: &'m M,
1018 hist: &[P],
1019 tol: f64,
1020 ) -> f64
1021 where
1022 M: IterableEntropyModel<'m, PRECISION> + 'm,
1023 M::Probability: BitArray + Into<u64> + Into<f64>,
1024 P: num_traits::Zero + Into<f64> + Copy + PartialOrd,
1025 {
1026 let weights: Vec<_> = model
1027 .symbol_table()
1028 .map(|(_, _, probability)| probability.get())
1029 .collect();
1030
1031 assert_eq!(weights.len(), hist.len());
1032 assert_eq!(
1033 weights.iter().map(|&x| Into::<u64>::into(x)).sum::<u64>(),
1034 1 << PRECISION
1035 );
1036 for &w in &weights {
1037 assert!(w > M::Probability::zero());
1038 }
1039
1040 let mut weights_and_hist = weights
1041 .iter()
1042 .cloned()
1043 .zip(hist.iter().cloned())
1044 .collect::<Vec<_>>();
1045
1046 // Check that sorting by weight is compatible with sorting by hist.
1047 weights_and_hist.sort_unstable_by(|a, b| a.partial_cmp(b).unwrap());
1048 // TODO: replace the following with
1049 // `assert!(weights_and_hist.iter().map(|&(_, x)| x).is_sorted())`
1050 // when `is_sorted` becomes stable.
1051 let mut previous = P::zero();
1052 for (_, hist) in weights_and_hist {
1053 assert!(hist >= previous);
1054 previous = hist;
1055 }
1056
1057 let normalization = hist.iter().map(|&x| x.into()).sum::<f64>();
1058 let normalized_hist = hist
1059 .iter()
1060 .map(|&x| Into::<f64>::into(x) / normalization)
1061 .collect::<Vec<_>>();
1062
1063 let kl = model.kl_divergence_base2::<f64>(normalized_hist);
1064 assert!(kl < tol);
1065
1066 kl
1067 }
1068}