Module constriction::stream::model
source · Expand description
Probability distributions that can be used as entropy models for stream codes.
This module provides utilities for dealing with probabilistic models of data sources (“entropy models”) in exactly invertible fixed-point arithmetic so that no rounding errors occur. As explained in the motivation below, preventing rounding errors is necessary for reliable entropy coding.
The types defined in this module approximate arbitrary discrete (or quantized one-dimensional continuous) probability distributions with a fixed-point representation. The fixed-point representation has customizable numeric precision and can be either explicit or implicit (i.e., lazy). While the conversion to the fixed-point approximation itself generally involves rounding, once the fixed-point representation is obtained, operations on it are exact. Therefore, the fixed-point representation can be used for entropy coding.
Module Overview
This module declares the base trait EntropyModel
and its subtraits EncoderModel
and DecoderModel
, which specify the interfaces that entropy models provide and that
entropy coders in the sister modules can rely on.
In addition, this module provides the following three utilities for constructing entropy models:
- an adapter that converts parameterized discrete distributions (e.g.,
Binomial
) or one-dimensional continuous probability distributions (e.g.Gaussian
) from a representation in terms of float-valued functions to an (implicit) exactly invertible fixed-point representation; when provided with a continuous distribution (a probability density) then this adapter also quantizes the data space into bins. SeeDefaultLeakyQuantizer
andSmallLeakyQuantizer
; - types for representing arbitrary categorical distributions in an explicit fixed-point
representation; these types are intended either as fallbacks for probability
distributions that lack an efficiently evaluable analytic expression of the cumulative
distribution function (and that therefore can’t be handled by the above adaptor), or
for efficient encoding of i.i.d. symbols by precalculating and tabularizing the
fixed-point representation of each allowed symbol. See
DefaultLeakyQuantizer
,DefaultContiguousCategoricalEntropyModel
,DefaultNonContiguousCategoricalEncoderModel
, andDefaultNonContiguousCategoricalDecoderModel
(and their respective counterparts with the “Small” instead of “Default” preset); and - types for high-performance “lookup tables” that enable efficient
decoding of i.i.d. data; these types build up a lookup table with
2^PRECISION
entries (one entry per possible quantile) and are therefore only recommended to be used with relatively smallPRECISION
. SeeSmallContiguousLookupDecoderModel
andSmallNonContiguousLookupDecoderModel
.
Examples
See LeakyQuantizer
, ContiguousCategoricalEntropyModel
,
NonContiguousCategoricalEncoderModel
. NonContiguousCategoricalDecoderModel
, and
LookupDecoderModel
.
TODO: direct links to “Examples” sections.
Motivation
The general idea of entropy coding to find an optimal compression strategy by using a
probabilistic model of the data source. Ideally, all conceivable data points would be
compressed into a short bit string. However, short bit strings are a scarce commodity:
for any integer N
, there are only 2^N - 1
distinct bit strings that are shorter than
N
bits. For this reason, entropy coding algorithms assign the scarce short bit strings
to data points that are most probable to appear in practice, while assigning longer bit
strings to data points that may be possible in principle but that are extremely
improbable in practice. More precisely, entropy coding aims to minimize the expected bit
rate under a probabilistic model of the data source. We refer to this model as an
“entropy model”.
In contrast to many other use cases of probabilistic models in computing, entropy models
must be amenable to exact arithmetic operations. In particular, no rounding errors are
allowed when inverting the cumulative distribution function. Even a single arbitrarily
small rounding error could set off a chain reaction leading to arbitrarily large and
arbitrarily many errors when compressing and then decompressing a sequence of symbols
(see, e.g., the motivating example for the ChainCoder
).
This module provides utilities for defining entropy models that can be inverted exactly
without any rounding errors.
Zero Probability
All entropy models provided in this module have a predictable support, i.e., it is always easy to predict exactly which symbols have nonzero probability under the model. This is an important property for entropy coding because trying to encode a symbol that has zero probability under the used entropy model would fail.
When constructing an entropy model then the caller always has to provide a support (either as an integer range or as a list of symbols of arbitrary type). All entropy models in this module enforce the following constraints:
- all symbols within the user-provided support are assigned at least the smallest
nonzero probability that is representable at the used fixed-point
PRECISION
(even if naive rounding would lead to zero probability); - all symbols that are not in the user-provided support have probability zero;
- the probabilities add up to one (this even holds when, e.g., quantizing a continuous probability distribution on a finite support that is smaller than the continuous distribution’s possibly unbounded support); and
- no single symbol has probability one, i.e., we disallow degenerate entropy models that put all probability mass on a single symbol, as such models can lead to problems in some entropy coders (if you don’t know whether you may encounter degenerate entropy models for some symbols, just check for degeneracy and encode nothing in that case since the corresponding symbols can be trivially reconstructed).
When entropy models are constructed from a floating-point representation of some probability distribution then rounding is done in such a way that the above constraints are satisfied. When entropy models are constructed by passing in probabilities that are already in fixed-point representation, then the constructor verifies the above constraints in an efficient way.
While constraints (1) and (4) above are strictly enforced (for types defined in this
module), constraints (2) and (3) hold in practice but must not be relied on for memory
safety as they can technically be violated without the use of unsafe
(by using a
LeakyQuantizer
with an invalid
Distribution
, i.e., one whose cumulative
distribution function either isn’t monotonic or has an image that exceeds the interval
[0, 1]
).
Structs
- An entropy model for a categorical probability distribution over a contiguous range of integers starting at zero.
- Internal representation of
ContiguousCategoricalEntropyModel
. - The iterator returned by
IterableEntropyModel::floating_point_symbol_table
. - An
EntropyModel
that approximates a parameterized probabilityDistribution
. - Iterator over the
symbol_table
of aLeakilyQuantizedDistribution
. - Quantizes probability distributions and represents them in fixed-point precision.
- A tabularized
DecoderModel
that is optimized for fast decoding of i.i.d. symbols - An entropy model for a categorical probability distribution over arbitrary symbols, for decoding only.
- An entropy model for a categorical probability distribution over arbitrary symbols, for encoding only.
- Internal representation of
NonContiguousCategoricalEncoderModel
andNonContiguousCategoricalDecoderModel
. - Iterator over the
symbol_table
of various categorical distributions.
Traits
- A trait for
EntropyModel
s that can be used for decoding (decompressing) data. - Re-export of
probability::distribution::Distribution
. - A trait for
EntropyModel
s that can be used for encoding (compressing) data. - Base trait for probabilistic models of a data source.
- Re-export of
probability::distribution::Inverse
. - A trait for
EntropyModel
s that can be serialized into a common format. - A trait for internal representations of various forms of categorical entropy models.
Type Aliases
- Type alias for a typical
ContiguousCategoricalEntropyModel
. - Type alias for a typical
LeakyQuantizer
. - Type alias for a typical
NonContiguousCategoricalDecoderModel
. - Type alias for a typical
NonContiguousCategoricalEncoderModel
. - Type alias for a
ContiguousCategoricalEntropyModel
optimized for compatibility with lookup decoder models. - Type alias for a
LookupDecoderModel
over symbols{0, 1, ..., n-1}
with sane settings. - Type alias for a
LeakyQuantizer
optimized for compatibility with lookup decoder models. - Type alias for a
NonContiguousCategoricalDecoderModel
optimized for compatibility with lookup decoder models. - Type alias for a
NonContiguousCategoricalEncoderModel
optimized for compatibility with lookup decoder models. - Type alias for a
LookupDecoderModel
over arbitrary symbols.