pub struct ContiguousCategoricalEntropyModel<Probability, Table, const PRECISION: usize> { /* private fields */ }
Expand description
An entropy model for a categorical probability distribution over a contiguous range of integers starting at zero.
You will usually want to use this type through one of its type aliases,
DefaultContiguousCategoricalEntropyModel
or
SmallContiguousCategoricalEntropyModel
, see discussion of presets.
This entropy model implements both EncoderModel
and DecoderModel
, which means
that it can be used for both encoding and decoding with any of the stream coders
provided by the constriction
crate.
Example
use constriction::{
stream::{stack::DefaultAnsCoder, model::DefaultContiguousCategoricalEntropyModel, Decode},
UnwrapInfallible,
};
// Create a `ContiguousCategoricalEntropyModel` that approximates floating point probabilities.
let probabilities = [0.3, 0.0, 0.4, 0.1, 0.2]; // Note that `probabilities[1] == 0.0`.
let model = DefaultContiguousCategoricalEntropyModel::from_floating_point_probabilities(
&probabilities
).unwrap();
assert_eq!(model.support_size(), 5); // `model` supports the symbols `0..5usize`.
// Use `model` for entropy coding.
let message = vec![2, 0, 3, 1, 2, 4, 3, 2, 0];
let mut ans_coder = DefaultAnsCoder::new();
// We could pass `model` by reference but passing `model.as_view()` is slightly more efficient.
ans_coder.encode_iid_symbols_reverse(message.iter().cloned(), model.as_view()).unwrap();
// Note that `message` contains the symbol `1`, and that `probabilities[1] == 0.0`. However, we
// can still encode the symbol because the `ContiguousCategoricalEntropyModel` is "leaky", i.e.,
// it assigns a nonzero probability to all symbols in the range `0..model.support_size()`.
// Decode the encoded message and verify correctness.
let decoded = ans_coder
.decode_iid_symbols(9, model.as_view())
.collect::<Result<Vec<_>, _>>()
.unwrap_infallible();
assert_eq!(decoded, message);
assert!(ans_coder.is_empty());
// The `model` assigns zero probability to any symbols that are not in the support
// `0..model.support_size()`, so trying to encode a message that contains such a symbol fails.
assert!(ans_coder.encode_iid_symbols_reverse(&[2, 0, 5, 1], model.as_view()).is_err())
// ERROR: symbol `5` is not in the support of `model`.
When Should I Use This Type of Entropy Model?
Use a ContiguousCategoricalEntropyModel
for probabilistic models that can only be
represented as an explicit probability table, and not by some more compact analytic
expression. If you have a probability model that can be expressed by some analytical
expression (e.g., a Binomial
distribution),
then use LeakyQuantizer
instead (unless you want to encode lots of symbols with the
same entropy model, in which case the explicitly tabulated representation of a
categorical entropy model could improve runtime performance).
Further, a ContiguousCategoricalEntropyModel
can only represent probability
distribution whose support (i.e., the set of symbols to which the model assigns a
non-zero probability) is a contiguous range of integers starting at zero. If the support
of your probability distribution has a more complicated structure (or if the Symbol
type is not an integer type), then you can use a
NonContiguousCategoricalEncoderModel
or a NonContiguousCategoricalDecoderModel
,
which are strictly more general than a ContiguousCategoricalEntropyModel
but which
have a larger memory footprint and slightly worse runtime performance.
If you want to decode lots of symbols with the same entropy model, and if reducing the
PRECISION
to a moderate value is acceptable to you, then you may want to consider
using a LookupDecoderModel
instead for even better runtime performance (at the cost
of a larger memory footprint and worse compression efficiency due to lower PRECISION
).
Computational Efficiency
For a probability distribution with a support of N
symbols, a
ContiguousCategoricalEntropyModel
has the following asymptotic costs:
- creation:
- runtime cost:
Θ(N)
when creating from fixed point probabilities,Θ(N log(N))
when creating from floating point probabilities; - memory footprint:
Θ(N)
; - both are cheaper by a constant factor than for a
NonContiguousCategoricalEncoderModel
or aNonContiguousCategoricalDecoderModel
.
- runtime cost:
- encoding a symbol (calling
EncoderModel::left_cumulative_and_probability
):- runtime cost:
Θ(1)
(cheaper than forNonContiguousCategoricalEncoderModel
since it compiles to a simiple array lookup rather than aHashMap
lookup) - memory footprint: no heap allocations, constant stack space.
- runtime cost:
- decoding a symbol (calling
DecoderModel::quantile_function
):- runtime cost:
Θ(log(N))
(both expected and worst-case; probably slightly cheaper than forNonContiguousCategoricalDecoderModel
due to better memory locality) - memory footprint: no heap allocations, constant stack space.
- runtime cost:
Implementations§
source§impl<Probability: BitArray, const PRECISION: usize> ContiguousCategoricalEntropyModel<Probability, Vec<Probability>, PRECISION>
impl<Probability: BitArray, const PRECISION: usize> ContiguousCategoricalEntropyModel<Probability, Vec<Probability>, PRECISION>
sourcepub fn from_floating_point_probabilities<F>(
probabilities: &[F]
) -> Result<Self, ()>where
F: FloatCore + Sum<F> + Into<f64>,
Probability: Into<f64> + AsPrimitive<usize>,
f64: AsPrimitive<Probability>,
usize: AsPrimitive<Probability>,
pub fn from_floating_point_probabilities<F>(
probabilities: &[F]
) -> Result<Self, ()>where
F: FloatCore + Sum<F> + Into<f64>,
Probability: Into<f64> + AsPrimitive<usize>,
f64: AsPrimitive<Probability>,
usize: AsPrimitive<Probability>,
Constructs a leaky distribution whose PMF approximates given probabilities.
The returned distribution will be defined for symbols of type usize
from
the range 0..probabilities.len()
.
The argument probabilities
is a slice of floating point values (F
is
typically f64
or f32
). All entries must be nonnegative and at least one
entry has to be nonzero. The entries do not necessarily need to add up to
one (the resulting distribution will automatically get normalized and an
overall scaling of all entries of probabilities
does not affect the
result, up to effects due to rounding errors).
The probability mass function of the returned distribution will approximate the provided probabilities as well as possible, subject to the following constraints:
- probabilities are represented in fixed point arithmetic, where the const
generic parameter
PRECISION
controls the number of bits of precision. This typically introduces rounding errors; - despite the possibility of rounding errors, the returned probability distribution will be exactly normalized; and
- each symbol in the support
0..probabilities.len()
gets assigned a strictly nonzero probability, even if the provided probability for the symbol is zero or below the threshold that can be resolved in fixed point arithmetic withPRECISION
bits. We refer to this property as the resulting distribution being “leaky”. The leakiness guarantees that all symbols within the support can be encoded when this distribution is used as an entropy model.
More precisely, the resulting probability distribution minimizes the cross entropy from the provided (floating point) to the resulting (fixed point) probabilities subject to the above three constraints.
Error Handling
Returns an error if the provided probability distribution cannot be
normalized, either because probabilities
is of length zero, or because one
of its entries is negative with a nonzero magnitude, or because the sum of
its elements is zero, infinite, or NaN.
Also returns an error if the probability distribution is degenerate, i.e.,
if probabilities
has only a single element, because degenerate probability
distributions currently cannot be represented.
TODO: should also return an error if support is too large to support leaky distribution
sourcepub fn from_nonzero_fixed_point_probabilities<I>(
probabilities: I,
infer_last_probability: bool
) -> Result<Self, ()>
pub fn from_nonzero_fixed_point_probabilities<I>( probabilities: I, infer_last_probability: bool ) -> Result<Self, ()>
Constructs a distribution with a PMF given in fixed point arithmetic.
This is a low level method that allows, e.g,. reconstructing a probability
distribution previously exported with symbol_table
. The more common way to
construct a LeakyCategorical
distribution is via
from_floating_point_probabilities
.
The items of probabilities
have to be nonzero and smaller than 1 << PRECISION
,
where PRECISION
is a const generic parameter on the
ContiguousCategoricalEntropyModel
.
If infer_last_probability
is false
then the items yielded by probabilities
have to (logically) sum up to 1 << PRECISION
. If infer_last_probability
is
true
then they must sum up to a value strictly smaller than 1 << PRECISION
, and
the method will add an additional symbol at the end that takes the remaining
probability mass.
Examples
If infer_last_probability
is false
, the provided probabilities have to sum up to
1 << PRECISION
:
use constriction::stream::model::{
DefaultContiguousCategoricalEntropyModel, IterableEntropyModel
};
let probabilities = vec![1u32 << 21, 1 << 22, 1 << 22, 1 << 22, 1 << 21];
// `probabilities` sums up to `1 << PRECISION` as required:
assert_eq!(probabilities.iter().sum::<u32>(), 1 << 24);
let model = DefaultContiguousCategoricalEntropyModel
::from_nonzero_fixed_point_probabilities(&probabilities, false).unwrap();
let symbol_table = model.floating_point_symbol_table::<f64>().collect::<Vec<_>>();
assert_eq!(
symbol_table,
vec![
(0, 0.0, 0.125),
(1, 0.125, 0.25),
(2, 0.375, 0.25),
(3, 0.625, 0.25),
(4, 0.875, 0.125),
]
);
If PRECISION
is set to the maximum value supported by the type Probability
, then
the provided probabilities still have to logically sum up to 1 << PRECISION
(i.e., the summation has to wrap around exactly once):
use constriction::stream::model::{
ContiguousCategoricalEntropyModel, IterableEntropyModel
};
let probabilities = vec![1u32 << 29, 1 << 30, 1 << 30, 1 << 30, 1 << 29];
// `probabilities` sums up to `1 << 32` (logically), i.e., it wraps around once.
assert_eq!(probabilities.iter().fold(0u32, |accum, &x| accum.wrapping_add(x)), 0);
let model = ContiguousCategoricalEntropyModel::<u32, Vec<u32>, 32>
::from_nonzero_fixed_point_probabilities(&probabilities, false).unwrap();
let symbol_table = model.floating_point_symbol_table::<f64>().collect::<Vec<_>>();
assert_eq!(
symbol_table,
vec![
(0, 0.0, 0.125),
(1, 0.125, 0.25),
(2, 0.375, 0.25),
(3, 0.625, 0.25),
(4, 0.875, 0.125)
]
);
Wrapping around twice fails:
use constriction::stream::model::ContiguousCategoricalEntropyModel;
let probabilities = vec![1u32 << 30, 1 << 31, 1 << 31, 1 << 31, 1 << 30];
// `probabilities` sums up to `1 << 33` (logically), i.e., it would wrap around twice.
assert!(
ContiguousCategoricalEntropyModel::<u32, Vec<u32>, 32>
::from_nonzero_fixed_point_probabilities(&probabilities, false).is_err()
);
So does providing probabilities that just don’t sum up to 1 << FREQUENCY
:
use constriction::stream::model::ContiguousCategoricalEntropyModel;
let probabilities = vec![1u32 << 21, 5 << 8, 1 << 22, 1 << 21];
// `probabilities` sums up to `1 << 33` (logically), i.e., it would wrap around twice.
assert!(
ContiguousCategoricalEntropyModel::<u32, Vec<u32>, 32>
::from_nonzero_fixed_point_probabilities(&probabilities, false).is_err()
);
source§impl<Probability, Table, const PRECISION: usize> ContiguousCategoricalEntropyModel<Probability, Table, PRECISION>
impl<Probability, Table, const PRECISION: usize> ContiguousCategoricalEntropyModel<Probability, Table, PRECISION>
sourcepub fn support_size(&self) -> usize
pub fn support_size(&self) -> usize
Returns the number of symbols supported by the model.
The distribution is defined on the contiguous range of symbols from zero
(inclusively) to support_size()
(exclusively). All symbols within this range are
guaranteed to have a nonzero probability, while all symbols outside of this range
have a zero probability.
sourcepub fn as_view(
&self
) -> ContiguousCategoricalEntropyModel<Probability, &[Probability], PRECISION>
pub fn as_view( &self ) -> ContiguousCategoricalEntropyModel<Probability, &[Probability], PRECISION>
Makes a very cheap shallow copy of the model that can be used much like a shared reference.
The returned ContiguousCategoricalEntropyModel
implements Copy
, which is a
requirement for some methods, such as Encode::encode_iid_symbols
or
Decode::decode_iid_symbols
. These methods could also accept a shared reference
to a ContiguousCategoricalEntropyModel
(since all references to entropy models are
also entropy models, and all shared references implement Copy
), but passing a
view instead may be slightly more efficient because it avoids one level of
dereferencing.
sourcepub fn to_lookup_decoder_model(
&self
) -> LookupDecoderModel<Probability, Probability, ContiguousSymbolTable<Vec<Probability>>, Box<[Probability]>, PRECISION>
pub fn to_lookup_decoder_model( &self ) -> LookupDecoderModel<Probability, Probability, ContiguousSymbolTable<Vec<Probability>>, Box<[Probability]>, PRECISION>
Creates a LookupDecoderModel
for efficient decoding of i.i.d. data
While a ContiguousCategoricalEntropyModel
can already be used for decoding (since
it implements DecoderModel
), you may prefer converting it to a
LookupDecoderModel
first for improved efficiency. Logically, the two will be
equivalent.
Warning
You should only call this method if both of the following conditions are satisfied:
PRECISION
is relatively small (typicallyPRECISION == 12
, as in the “Small” preset) because the memory footprint of aLookupDecoderModel
grows exponentially inPRECISION
; and- you’re about to decode a relatively large number of symbols with the resulting
model; the conversion to a
LookupDecoderModel
bears a significant runtime and memory overhead, so if you’re going to use the resulting model only for a single or a handful of symbols then you’ll end up paying more than you gain.
Trait Implementations§
source§impl<Probability: Clone, Table: Clone, const PRECISION: usize> Clone for ContiguousCategoricalEntropyModel<Probability, Table, PRECISION>
impl<Probability: Clone, Table: Clone, const PRECISION: usize> Clone for ContiguousCategoricalEntropyModel<Probability, Table, PRECISION>
source§fn clone(
&self
) -> ContiguousCategoricalEntropyModel<Probability, Table, PRECISION>
fn clone( &self ) -> ContiguousCategoricalEntropyModel<Probability, Table, PRECISION>
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moresource§impl<Probability: Debug, Table: Debug, const PRECISION: usize> Debug for ContiguousCategoricalEntropyModel<Probability, Table, PRECISION>
impl<Probability: Debug, Table: Debug, const PRECISION: usize> Debug for ContiguousCategoricalEntropyModel<Probability, Table, PRECISION>
source§impl<Probability, Table, const PRECISION: usize> DecoderModel<PRECISION> for ContiguousCategoricalEntropyModel<Probability, Table, PRECISION>
impl<Probability, Table, const PRECISION: usize> DecoderModel<PRECISION> for ContiguousCategoricalEntropyModel<Probability, Table, PRECISION>
source§fn quantile_function(
&self,
quantile: Self::Probability
) -> (usize, Probability, Probability::NonZero)
fn quantile_function( &self, quantile: Self::Probability ) -> (usize, Probability, Probability::NonZero)
source§impl<Probability, Table, const PRECISION: usize> EncoderModel<PRECISION> for ContiguousCategoricalEntropyModel<Probability, Table, PRECISION>
impl<Probability, Table, const PRECISION: usize> EncoderModel<PRECISION> for ContiguousCategoricalEntropyModel<Probability, Table, PRECISION>
EncoderModel
is only implemented for contiguous generic categorical models. To
decode encode symbols from a non-contiguous support, use an
NonContiguousCategoricalEncoderModel
.
source§fn left_cumulative_and_probability(
&self,
symbol: impl Borrow<usize>
) -> Option<(Probability, Probability::NonZero)>
fn left_cumulative_and_probability( &self, symbol: impl Borrow<usize> ) -> Option<(Probability, Probability::NonZero)>
source§fn floating_point_probability<F>(&self, symbol: Self::Symbol) -> F
fn floating_point_probability<F>(&self, symbol: Self::Symbol) -> F
source§impl<Probability, Table, const PRECISION: usize> EntropyModel<PRECISION> for ContiguousCategoricalEntropyModel<Probability, Table, PRECISION>where
Probability: BitArray,
impl<Probability, Table, const PRECISION: usize> EntropyModel<PRECISION> for ContiguousCategoricalEntropyModel<Probability, Table, PRECISION>where
Probability: BitArray,
source§impl<'m, Probability, Table, const PRECISION: usize> From<&'m ContiguousCategoricalEntropyModel<Probability, Table, PRECISION>> for LookupDecoderModel<Probability, Probability, ContiguousSymbolTable<Vec<Probability>>, Box<[Probability]>, PRECISION>where
Probability: BitArray + Into<usize>,
usize: AsPrimitive<Probability>,
Table: AsRef<[Probability]>,
impl<'m, Probability, Table, const PRECISION: usize> From<&'m ContiguousCategoricalEntropyModel<Probability, Table, PRECISION>> for LookupDecoderModel<Probability, Probability, ContiguousSymbolTable<Vec<Probability>>, Box<[Probability]>, PRECISION>where
Probability: BitArray + Into<usize>,
usize: AsPrimitive<Probability>,
Table: AsRef<[Probability]>,
source§fn from(
model: &'m ContiguousCategoricalEntropyModel<Probability, Table, PRECISION>
) -> Self
fn from( model: &'m ContiguousCategoricalEntropyModel<Probability, Table, PRECISION> ) -> Self
source§impl<'m, Probability, Table, const PRECISION: usize> IterableEntropyModel<'m, PRECISION> for ContiguousCategoricalEntropyModel<Probability, Table, PRECISION>
impl<'m, Probability, Table, const PRECISION: usize> IterableEntropyModel<'m, PRECISION> for ContiguousCategoricalEntropyModel<Probability, Table, PRECISION>
§type Iter = SymbolTableIter<usize, Probability, ContiguousSymbolTable<&'m [Probability]>>
type Iter = SymbolTableIter<usize, Probability, ContiguousSymbolTable<&'m [Probability]>>
symbol_table
. Read moresource§fn symbol_table(&'m self) -> Self::Iter
fn symbol_table(&'m self) -> Self::Iter
source§fn floating_point_symbol_table<F>(
&'m self
) -> FloatingPointSymbolTable<F, Self::Iter, PRECISION> ⓘwhere
F: From<Self::Probability>,
fn floating_point_symbol_table<F>(
&'m self
) -> FloatingPointSymbolTable<F, Self::Iter, PRECISION> ⓘwhere
F: From<Self::Probability>,
symbol_table
, but yields both cumulatives and probabilities in
floating point representation. Read more