Struct constriction::stream::model::LeakyQuantizer
source · pub struct LeakyQuantizer<F, Symbol, Probability, const PRECISION: usize> { /* private fields */ }Expand description
Quantizes probability distributions and represents them in fixed-point precision.
You will usually want to use this type through one of its type aliases,
DefaultLeakyQuantizer or SmallLeakyQuantizer, see discussion of
presets.
Examples
Quantizing Continuous Distributions
use constriction::{
stream::{model::DefaultLeakyQuantizer, stack::DefaultAnsCoder, Encode, Decode},
UnwrapInfallible,
};
// Create a quantizer that supports integer symbols from -5 to 20 (inclusively),
// using the "default" preset for `Probability` and `PRECISION`.
let quantizer = DefaultLeakyQuantizer::new(-5..=20);
// Quantize a normal distribution with mean 8.3 and standard deviation 4.1.
let continuous_distribution1 = probability::distribution::Gaussian::new(8.3, 4.1);
let entropy_model1 = quantizer.quantize(continuous_distribution1);
// You can reuse the same quantizer for more than one distribution, and the distributions don't
// even have to be of the same type (e.g., one can be a `Gaussian` and another a `Laplace`).
let continuous_distribution2 = probability::distribution::Laplace::new(-1.4, 2.7);
let entropy_model2 = quantizer.quantize(continuous_distribution2);
// Use the entropy models with an entropy coder.
let mut ans_coder = DefaultAnsCoder::new();
ans_coder.encode_symbol(4, entropy_model1).unwrap();
ans_coder.encode_symbol(-3, entropy_model2).unwrap();
// Decode symbols (in reverse order, since the `AnsCoder` is a stack) and verify correctness.
assert_eq!(ans_coder.decode_symbol(entropy_model2).unwrap_infallible(), -3);
assert_eq!(ans_coder.decode_symbol(entropy_model1).unwrap_infallible(), 4);
assert!(ans_coder.is_empty());Quantizing a Discrete Distribution (That Has an Analytic Expression)
If you pass a discrete probability distribution to the method quantize then it no
longer needs to perform any quantization in the data space, but it will still perform
steps 2 and 3 in the list below, i.e., it will still convert to a “leaky” fixed-point
approximation that can be used by any of constrictions’s stream codes. In the
following example, we’ll quantize a Binomial
distribution (as discussed below, you
should not quantize a Categorical
distribution since there are more efficient specialized types for this use case).
use constriction::stream::{
model::DefaultLeakyQuantizer, queue::DefaultRangeEncoder, Encode, Decode
};
let distribution = probability::distribution::Binomial::new(1000, 0.1); // arguments: `n, p`
let quantizer = DefaultLeakyQuantizer::new(0..=1000); // natural support is `0..=n`
let entropy_model = quantizer.quantize(distribution);
// Let's use a Range Coder this time, just for fun (we could as well use an ANS Coder again).
let mut range_encoder = DefaultRangeEncoder::new();
// Encode a "typical" symbol from the distribution (i.e., one with non-negligible probability).
range_encoder.encode_symbol(107, entropy_model).unwrap();
// Due to the "leakiness" of the quantizer, the following still works despite the fact that
// the symbol `1000` has a ridiculously low probability under the binomial distribution.
range_encoder.encode_symbol(1000, entropy_model).unwrap();
// Decode symbols (in forward order, since range coding operates as a queue) and verify.
let mut range_decoder = range_encoder.into_decoder().unwrap();
assert_eq!(range_decoder.decode_symbol(entropy_model).unwrap(), 107);
assert_eq!(range_decoder.decode_symbol(entropy_model).unwrap(), 1000);
assert!(range_decoder.maybe_exhausted());Detailed Description
A LeakyQuantizer is a builder of LeakilyQuantizedDistributions. It takes an
arbitrary probability distribution that implements the Distribution trait from the
crate probability and turns it into a LeakilyQuantizedDistribution by performing
the following three steps:
- quantization: lossless entropy coding can only be performed over discrete data. Any continuous (real-valued) data has to be approximated by some discrete set of points. If you provide a continuous distributions (i.e., a probability density function) to this builder, then it will quantize the data space by rounding values to the nearest integer. This step is optional, see below.
- approximation with fixed-point arithmetic: an entropy model that is used for
compressing and decompressing has to be exactly invertible, so that its
EncoderModelimplementation is compatible with itsDecoderModelimplementation. TheLeakilyQuantizedDistributions that are built by this builder represent probabilities and quantiles in fixed-point arithmetic withPRECISIONbits. This allows them to avoid rounding errors when inverting the model, so that they can implement bothEncoderModelandDecoderModelin such a way that one is the exact inverse of the other. - introducing leakiness: naively approximating a probability distribution with
fixed point arithmetic could lead to problems: it could round some very small
probabilities to zero. This would have the undesirable effect that the corresponding
symbol then could no longer be encoded. This builder ensures that the
LeakilyQuantizedDistributions that it creates assign a nonzero probability to all symbols within a user-defined range, so that these symbols can always be encoded, even if their probabilities under the original probability distribution are very low (or even zero).
Continuous vs. Discrete Probability Distributions
The method quantize accepts both continuous probability distributions (i.e.,
probability density functions, such as Gaussian) and discrete distributions that are
defined only on (some) integers (i.e., probability mass functions, such as
Binomial). The resulting LeakilyQuantizedDistribution will always be a discrete
probability distribution. If the original probability distribution is continuous, then
the quantizer implicitly creates bins of size one by rounding to the nearest integer
(i.e., the bins range from i - 0.5 to i + 0.5 for each integer i). If the original
probability distribution is discrete then no rounding in the symbol space occurs, but
the quantizer still performs steps 2 and 3 above, i.e., it still rounds probabilities
and quantiles to fixed-point arithmetic in a way that ensures that all probabilities
within a user-defined range are nonzero.
Don’t Quantize Categorical Distributions, Though.
Although you can use a LeakyQuantizer for discrete probability distributions, you
should not use it for probability distributions of the type
probability::distribution::Categorical. While this will technically work, it will
lead to poor computational performance (and also to slightly suboptimal compression
efficiency). If you’re dealing with categorical distributions, use one of the dedicated
types ContiguousCategoricalEntropyModel, NonContiguousCategoricalEncoderModel,
NonContiguousCategoricalDecoderModel, or LookupDecoderModel instead.
By contrast, do use a LeakyQuantizer if the underlying probability Distribution
can be described by some analytic function (e.g., the function f(x) ∝ e^{-(x-\mu)^2/2}
describing the bell curve of a Gaussian distribution, or the function f_n(k) = (n choose k) p^k (1-p)^{n-k} describing the probability mass function of a binomial
distribution). For such parameterized distributions, both the cumulative distribution
function and its inverse can often be expressed as, or at least approximated by, some
analytic expression that can be evaluated in constant time, independent of the number of
possible symbols.
Computational Efficiency
Two things should be noted about computational efficiency:
- quantization is lazy: both the constructor of a
LeakyQuantizerand the methodquantizeperform only a small constant amount of work, independent of thePRECISIONand the number of symbols on which the resulting entropy model will be defined. The actual quantization is done once the resultingLeakilyQuantizedDistributionis used for encoding and/or decoding, and it is only done for the involved symbols. - quantization for decoding is more expensive than for encoding: using a
LeakilyQuantizedDistributionas anEncoderModelonly requires evaluating the cumulative distribution function (CDF) of the underlying continuous probability distribution a constant number of times (twice, to be precise). By contrast, using it as aDecoderModelrequires numerical inversion of the cumulative distribution function. This numerical inversion starts by callingInverse::inversefrom the crateprobabilityon the underlying continuous probability distribution. But the result of this method call then has to be refined by repeatedly probing the CDF in order to deal with inevitable rounding errors in the implementation ofInverse::inverse. The number of required iterations will depend on how accurate the implementation ofInverse::inverseis.
The laziness means that it is relatively cheap to use a different
LeakilyQuantizedDistribution for each symbol of the message, which is a common
thing to do in machine-learning based compression methods. By contrast, if you want to
use the same entropy model for many symbols then a LeakilyQuantizedDistribution can
become unnecessarily expensive, especially for decoding, because you might end up
calculating the inverse CDF in the same region over and over again. If this is the case,
consider tabularizing the LeakilyQuantizedDistribution that you obtain from the method
quantize by calling to_generic_encoder_model or to_generic_decoder_model on
it (or, if you use a low PRECISION, you may even consider calling
to_generic_lookup_decoder_model). You’ll have to bring the trait
IterableEntropyModel into scope to call these conversion methods (use constriction::stream::model::IterableEntropyModel).
Requirements for Correctness
The original distribution that you pass to the method quantize can only be an
approximation of a true (normalized) probability distribution because it represents
probabilities with finite (floating point) precision. Despite the possibility of
rounding errors in the underlying (floating point) distribution, a LeakyQuantizer is
guaranteed to generate a valid entropy model with exactly compatible implementations of
EncoderModel and DecoderModel as long as both of the following requirements are
met:
- The cumulative distribution function (CDF)
Distribution::distributionis defined on all mid points between integers that lie within the range that is provided as argumentsupportto thenewmethod; it is monotonically nondecreasing, and its values do not exceed the closed interval[0.0, 1.0]. It is OK if the CDF does not cover the entire interval from0.0to1.0(e.g., due to rounding errors or clipping); any remaining probability mass on the tails is added to the probability of the symbols at the respective ends of thesupport. - The quantile function or inverse CDF
Inverse::inverseevaluates to a finite non-NaN value everywhere on the open interval(0.0, 1.0), and it is monotonically nondecreasing on this interval. It does not have to be defined at the boundaries0.0or1.0(more precisely, it only has to be defined on the closed interval[epsilon, 1.0 - epsilon]whereepsilon := 2.0^{-(PRECISION+1)}and^denotes mathematical exponentiation). Further, the implementation ofInverse::inversedoes not actually have to be the inverse ofDistribution::distributionbecause it is only used as an initial hint where to start a search for the true inverse. It is OK ifInverse::inverseis just some approximation of the true inverse CDF. Any deviations betweenInverse::inverseand the true inverse CDF will negatively impact runtime performance but will otherwise have no observable effect.
Implementations§
source§impl<F, Symbol, Probability, const PRECISION: usize> LeakyQuantizer<F, Symbol, Probability, PRECISION>where
Probability: BitArray + Into<F>,
Symbol: PrimInt + AsPrimitive<Probability> + WrappingSub + WrappingAdd,
F: FloatCore,
impl<F, Symbol, Probability, const PRECISION: usize> LeakyQuantizer<F, Symbol, Probability, PRECISION>where
Probability: BitArray + Into<F>,
Symbol: PrimInt + AsPrimitive<Probability> + WrappingSub + WrappingAdd,
F: FloatCore,
sourcepub fn new(support: RangeInclusive<Symbol>) -> Self
pub fn new(support: RangeInclusive<Symbol>) -> Self
Constructs a LeakyQuantizer with a finite support.
The support is an inclusive range (which can be expressed with the ..= notation,
as in -100..=100). All LeakilyQuantizedDistributions generated by this
LeakyQuantizer are then guaranteed to assign a nonzero probability to all symbols
within the support, and a zero probability to all symbols outside of the
support. Having a known support is often a useful property of entropy models
because it ensures that all symbols within the support can indeed be encoded, even
if their probability under the underlying probability distribution is extremely
small.
This method takes support as a RangeInclusive because we want to support, e.g.,
probability distributions over the Symbol type u8 with full support 0..=255.
Panics
Panics if either of the following conditions is met:
supportis empty; orsupportcontains only a single value (we do not support degenerate probability distributions that put all probability mass on a single symbol); orsupportis larger than1 << PRECISION(because in this case, assigning any representable nonzero probability to all elements ofsupportwould exceed our probability budge).
sourcepub fn quantize<D: Distribution>(
self,
distribution: D
) -> LeakilyQuantizedDistribution<F, Symbol, Probability, D, PRECISION>
pub fn quantize<D: Distribution>( self, distribution: D ) -> LeakilyQuantizedDistribution<F, Symbol, Probability, D, PRECISION>
Quantizes the given probability distribution and returns an EntropyModel.
See struct documentation for details and code examples.
Note that this method takes self only by reference, i.e., you can reuse
the same Quantizer to quantize arbitrarily many distributions.
sourcepub fn support(&self) -> RangeInclusive<Symbol>
pub fn support(&self) -> RangeInclusive<Symbol>
Returns the exact range of symbols that have nonzero probability.
The returned inclusive range is the same as the one that was passed in to the
constructor new. All entropy models created by the method
quantize will assign a nonzero probability to all elements in
the support, and they will assign a zero probability to all elements outside of
the support. The support contains at least two and at most 1 << PRECISION
elements.
Trait Implementations§
source§impl<F: Clone, Symbol: Clone, Probability: Clone, const PRECISION: usize> Clone for LeakyQuantizer<F, Symbol, Probability, PRECISION>
impl<F: Clone, Symbol: Clone, Probability: Clone, const PRECISION: usize> Clone for LeakyQuantizer<F, Symbol, Probability, PRECISION>
source§fn clone(&self) -> LeakyQuantizer<F, Symbol, Probability, PRECISION>
fn clone(&self) -> LeakyQuantizer<F, Symbol, Probability, PRECISION>
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more