Struct constriction::stream::model::LeakyQuantizer
source · pub struct LeakyQuantizer<F, Symbol, Probability, const PRECISION: usize> { /* private fields */ }
Expand description
Quantizes probability distributions and represents them in fixed-point precision.
You will usually want to use this type through one of its type aliases,
DefaultLeakyQuantizer
or SmallLeakyQuantizer
, see discussion of
presets.
Examples
Quantizing Continuous Distributions
use constriction::{
stream::{model::DefaultLeakyQuantizer, stack::DefaultAnsCoder, Encode, Decode},
UnwrapInfallible,
};
// Create a quantizer that supports integer symbols from -5 to 20 (inclusively),
// using the "default" preset for `Probability` and `PRECISION`.
let quantizer = DefaultLeakyQuantizer::new(-5..=20);
// Quantize a normal distribution with mean 8.3 and standard deviation 4.1.
let continuous_distribution1 = probability::distribution::Gaussian::new(8.3, 4.1);
let entropy_model1 = quantizer.quantize(continuous_distribution1);
// You can reuse the same quantizer for more than one distribution, and the distributions don't
// even have to be of the same type (e.g., one can be a `Gaussian` and another a `Laplace`).
let continuous_distribution2 = probability::distribution::Laplace::new(-1.4, 2.7);
let entropy_model2 = quantizer.quantize(continuous_distribution2);
// Use the entropy models with an entropy coder.
let mut ans_coder = DefaultAnsCoder::new();
ans_coder.encode_symbol(4, entropy_model1).unwrap();
ans_coder.encode_symbol(-3, entropy_model2).unwrap();
// Decode symbols (in reverse order, since the `AnsCoder` is a stack) and verify correctness.
assert_eq!(ans_coder.decode_symbol(entropy_model2).unwrap_infallible(), -3);
assert_eq!(ans_coder.decode_symbol(entropy_model1).unwrap_infallible(), 4);
assert!(ans_coder.is_empty());
Quantizing a Discrete Distribution (That Has an Analytic Expression)
If you pass a discrete probability distribution to the method quantize
then it no
longer needs to perform any quantization in the data space, but it will still perform
steps 2 and 3 in the list below, i.e., it will still convert to a “leaky” fixed-point
approximation that can be used by any of constrictions
’s stream codes. In the
following example, we’ll quantize a Binomial
distribution (as discussed below, you
should not quantize a Categorical
distribution since there are more efficient specialized types for this use case).
use constriction::stream::{
model::DefaultLeakyQuantizer, queue::DefaultRangeEncoder, Encode, Decode
};
let distribution = probability::distribution::Binomial::new(1000, 0.1); // arguments: `n, p`
let quantizer = DefaultLeakyQuantizer::new(0..=1000); // natural support is `0..=n`
let entropy_model = quantizer.quantize(distribution);
// Let's use a Range Coder this time, just for fun (we could as well use an ANS Coder again).
let mut range_encoder = DefaultRangeEncoder::new();
// Encode a "typical" symbol from the distribution (i.e., one with non-negligible probability).
range_encoder.encode_symbol(107, entropy_model).unwrap();
// Due to the "leakiness" of the quantizer, the following still works despite the fact that
// the symbol `1000` has a ridiculously low probability under the binomial distribution.
range_encoder.encode_symbol(1000, entropy_model).unwrap();
// Decode symbols (in forward order, since range coding operates as a queue) and verify.
let mut range_decoder = range_encoder.into_decoder().unwrap();
assert_eq!(range_decoder.decode_symbol(entropy_model).unwrap(), 107);
assert_eq!(range_decoder.decode_symbol(entropy_model).unwrap(), 1000);
assert!(range_decoder.maybe_exhausted());
Detailed Description
A LeakyQuantizer
is a builder of LeakilyQuantizedDistribution
s. It takes an
arbitrary probability distribution that implements the Distribution
trait from the
crate probability
and turns it into a LeakilyQuantizedDistribution
by performing
the following three steps:
- quantization: lossless entropy coding can only be performed over discrete data. Any continuous (real-valued) data has to be approximated by some discrete set of points. If you provide a continuous distributions (i.e., a probability density function) to this builder, then it will quantize the data space by rounding values to the nearest integer. This step is optional, see below.
- approximation with fixed-point arithmetic: an entropy model that is used for
compressing and decompressing has to be exactly invertible, so that its
EncoderModel
implementation is compatible with itsDecoderModel
implementation. TheLeakilyQuantizedDistribution
s that are built by this builder represent probabilities and quantiles in fixed-point arithmetic withPRECISION
bits. This allows them to avoid rounding errors when inverting the model, so that they can implement bothEncoderModel
andDecoderModel
in such a way that one is the exact inverse of the other. - introducing leakiness: naively approximating a probability distribution with
fixed point arithmetic could lead to problems: it could round some very small
probabilities to zero. This would have the undesirable effect that the corresponding
symbol then could no longer be encoded. This builder ensures that the
LeakilyQuantizedDistribution
s that it creates assign a nonzero probability to all symbols within a user-defined range, so that these symbols can always be encoded, even if their probabilities under the original probability distribution are very low (or even zero).
Continuous vs. Discrete Probability Distributions
The method quantize
accepts both continuous probability distributions (i.e.,
probability density functions, such as Gaussian
) and discrete distributions that are
defined only on (some) integers (i.e., probability mass functions, such as
Binomial
). The resulting LeakilyQuantizedDistribution
will always be a discrete
probability distribution. If the original probability distribution is continuous, then
the quantizer implicitly creates bins of size one by rounding to the nearest integer
(i.e., the bins range from i - 0.5
to i + 0.5
for each integer i
). If the original
probability distribution is discrete then no rounding in the symbol space occurs, but
the quantizer still performs steps 2 and 3 above, i.e., it still rounds probabilities
and quantiles to fixed-point arithmetic in a way that ensures that all probabilities
within a user-defined range are nonzero.
Don’t Quantize Categorical Distributions, Though.
Although you can use a LeakyQuantizer
for discrete probability distributions, you
should not use it for probability distributions of the type
probability::distribution::Categorical
. While this will technically work, it will
lead to poor computational performance (and also to slightly suboptimal compression
efficiency). If you’re dealing with categorical distributions, use one of the dedicated
types ContiguousCategoricalEntropyModel
, NonContiguousCategoricalEncoderModel
,
NonContiguousCategoricalDecoderModel
, or LookupDecoderModel
instead.
By contrast, do use a LeakyQuantizer
if the underlying probability Distribution
can be described by some analytic function (e.g., the function f(x) ∝ e^{-(x-\mu)^2/2}
describing the bell curve of a Gaussian distribution, or the function f_n(k) = (n choose k) p^k (1-p)^{n-k}
describing the probability mass function of a binomial
distribution). For such parameterized distributions, both the cumulative distribution
function and its inverse can often be expressed as, or at least approximated by, some
analytic expression that can be evaluated in constant time, independent of the number of
possible symbols.
Computational Efficiency
Two things should be noted about computational efficiency:
- quantization is lazy: both the constructor of a
LeakyQuantizer
and the methodquantize
perform only a small constant amount of work, independent of thePRECISION
and the number of symbols on which the resulting entropy model will be defined. The actual quantization is done once the resultingLeakilyQuantizedDistribution
is used for encoding and/or decoding, and it is only done for the involved symbols. - quantization for decoding is more expensive than for encoding: using a
LeakilyQuantizedDistribution
as anEncoderModel
only requires evaluating the cumulative distribution function (CDF) of the underlying continuous probability distribution a constant number of times (twice, to be precise). By contrast, using it as aDecoderModel
requires numerical inversion of the cumulative distribution function. This numerical inversion starts by callingInverse::inverse
from the crateprobability
on the underlying continuous probability distribution. But the result of this method call then has to be refined by repeatedly probing the CDF in order to deal with inevitable rounding errors in the implementation ofInverse::inverse
. The number of required iterations will depend on how accurate the implementation ofInverse::inverse
is.
The laziness means that it is relatively cheap to use a different
LeakilyQuantizedDistribution
for each symbol of the message, which is a common
thing to do in machine-learning based compression methods. By contrast, if you want to
use the same entropy model for many symbols then a LeakilyQuantizedDistribution
can
become unnecessarily expensive, especially for decoding, because you might end up
calculating the inverse CDF in the same region over and over again. If this is the case,
consider tabularizing the LeakilyQuantizedDistribution
that you obtain from the method
quantize
by calling to_generic_encoder_model
or to_generic_decoder_model
on
it (or, if you use a low PRECISION
, you may even consider calling
to_generic_lookup_decoder_model
). You’ll have to bring the trait
IterableEntropyModel
into scope to call these conversion methods (use constriction::stream::model::IterableEntropyModel
).
Requirements for Correctness
The original distribution that you pass to the method quantize
can only be an
approximation of a true (normalized) probability distribution because it represents
probabilities with finite (floating point) precision. Despite the possibility of
rounding errors in the underlying (floating point) distribution, a LeakyQuantizer
is
guaranteed to generate a valid entropy model with exactly compatible implementations of
EncoderModel
and DecoderModel
as long as both of the following requirements are
met:
- The cumulative distribution function (CDF)
Distribution::distribution
is defined on all mid points between integers that lie within the range that is provided as argumentsupport
to thenew
method; it is monotonically nondecreasing, and its values do not exceed the closed interval[0.0, 1.0]
. It is OK if the CDF does not cover the entire interval from0.0
to1.0
(e.g., due to rounding errors or clipping); any remaining probability mass on the tails is added to the probability of the symbols at the respective ends of thesupport
. - The quantile function or inverse CDF
Inverse::inverse
evaluates to a finite non-NaN value everywhere on the open interval(0.0, 1.0)
, and it is monotonically nondecreasing on this interval. It does not have to be defined at the boundaries0.0
or1.0
(more precisely, it only has to be defined on the closed interval[epsilon, 1.0 - epsilon]
whereepsilon := 2.0^{-(PRECISION+1)}
and^
denotes mathematical exponentiation). Further, the implementation ofInverse::inverse
does not actually have to be the inverse ofDistribution::distribution
because it is only used as an initial hint where to start a search for the true inverse. It is OK ifInverse::inverse
is just some approximation of the true inverse CDF. Any deviations betweenInverse::inverse
and the true inverse CDF will negatively impact runtime performance but will otherwise have no observable effect.
Implementations§
source§impl<F, Symbol, Probability, const PRECISION: usize> LeakyQuantizer<F, Symbol, Probability, PRECISION>where
Probability: BitArray + Into<F>,
Symbol: PrimInt + AsPrimitive<Probability> + WrappingSub + WrappingAdd,
F: FloatCore,
impl<F, Symbol, Probability, const PRECISION: usize> LeakyQuantizer<F, Symbol, Probability, PRECISION>where
Probability: BitArray + Into<F>,
Symbol: PrimInt + AsPrimitive<Probability> + WrappingSub + WrappingAdd,
F: FloatCore,
sourcepub fn new(support: RangeInclusive<Symbol>) -> Self
pub fn new(support: RangeInclusive<Symbol>) -> Self
Constructs a LeakyQuantizer
with a finite support.
The support
is an inclusive range (which can be expressed with the ..=
notation,
as in -100..=100
). All LeakilyQuantizedDistribution
s generated by this
LeakyQuantizer
are then guaranteed to assign a nonzero probability to all symbols
within the support
, and a zero probability to all symbols outside of the
support
. Having a known support is often a useful property of entropy models
because it ensures that all symbols within the support
can indeed be encoded, even
if their probability under the underlying probability distribution is extremely
small.
This method takes support
as a RangeInclusive
because we want to support, e.g.,
probability distributions over the Symbol
type u8
with full support 0..=255
.
Panics
Panics if either of the following conditions is met:
support
is empty; orsupport
contains only a single value (we do not support degenerate probability distributions that put all probability mass on a single symbol); orsupport
is larger than1 << PRECISION
(because in this case, assigning any representable nonzero probability to all elements ofsupport
would exceed our probability budge).
sourcepub fn quantize<D: Distribution>(
self,
distribution: D
) -> LeakilyQuantizedDistribution<F, Symbol, Probability, D, PRECISION>
pub fn quantize<D: Distribution>( self, distribution: D ) -> LeakilyQuantizedDistribution<F, Symbol, Probability, D, PRECISION>
Quantizes the given probability distribution and returns an EntropyModel
.
See struct documentation for details and code examples.
Note that this method takes self
only by reference, i.e., you can reuse
the same Quantizer
to quantize arbitrarily many distributions.
sourcepub fn support(&self) -> RangeInclusive<Symbol>
pub fn support(&self) -> RangeInclusive<Symbol>
Returns the exact range of symbols that have nonzero probability.
The returned inclusive range is the same as the one that was passed in to the
constructor new
. All entropy models created by the method
quantize
will assign a nonzero probability to all elements in
the support
, and they will assign a zero probability to all elements outside of
the support
. The support contains at least two and at most 1 << PRECISION
elements.
Trait Implementations§
source§impl<F: Clone, Symbol: Clone, Probability: Clone, const PRECISION: usize> Clone for LeakyQuantizer<F, Symbol, Probability, PRECISION>
impl<F: Clone, Symbol: Clone, Probability: Clone, const PRECISION: usize> Clone for LeakyQuantizer<F, Symbol, Probability, PRECISION>
source§fn clone(&self) -> LeakyQuantizer<F, Symbol, Probability, PRECISION>
fn clone(&self) -> LeakyQuantizer<F, Symbol, Probability, PRECISION>
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read more