pub struct NonContiguousCategoricalEncoderModel<Symbol, Probability, const PRECISION: usize>{ /* private fields */ }
Expand description
An entropy model for a categorical probability distribution over arbitrary symbols, for encoding only.
You will usually want to use this type through one of its type aliases,
DefaultNonContiguousCategoricalEncoderModel
or
SmallNonContiguousCategoricalEncoderModel
, see discussion of
presets.
This type implements the trait EncoderModel
but not the trait DecoderModel
.
Thus, you can use a NonContiguousCategoricalEncoderModel
for encoding with any of
the stream encoders provided by the constriction
crate, but not for decoding. If you
want to decode data, use a NonContiguousCategoricalDecoderModel
instead.
§Example
use constriction::{
stream::{stack::DefaultAnsCoder, Decode},
stream::model::DefaultNonContiguousCategoricalEncoderModel,
stream::model::DefaultNonContiguousCategoricalDecoderModel,
UnwrapInfallible,
};
// Create a `ContiguousCategoricalEntropyModel` that approximates floating point probabilities.
let alphabet = ['M', 'i', 's', 'p', '!'];
let probabilities = [0.09, 0.36, 0.36, 0.18, 0.0];
let encoder_model = DefaultNonContiguousCategoricalEncoderModel
::from_symbols_and_floating_point_probabilities_fast(
alphabet.iter().cloned(),
&probabilities,
None
)
.unwrap();
assert_eq!(encoder_model.support_size(), 5); // `encoder_model` supports 4 symbols.
// Use `encoder_model` for entropy coding.
let message = "Mississippi!";
let mut ans_coder = DefaultAnsCoder::new();
ans_coder.encode_iid_symbols_reverse(message.chars(), &encoder_model).unwrap();
// Note that `message` contains the symbol '!', which has zero probability under our
// floating-point model. However, we can still encode the symbol because the
// `NonContiguousCategoricalEntropyModel` is "leaky", i.e., it assigns a nonzero
// probability to all symbols that we provided to the constructor.
// Create a matching `decoder_model`, decode the encoded message, and verify correctness.
let decoder_model = DefaultNonContiguousCategoricalDecoderModel
::from_symbols_and_floating_point_probabilities_fast(
&alphabet, &probabilities, None
)
.unwrap();
// We could pass `decoder_model` by reference (like we did for `encoder_model` above) but
// passing `decoder_model.as_view()` is slightly more efficient.
let decoded = ans_coder
.decode_iid_symbols(12, decoder_model.as_view())
.collect::<Result<String, _>>()
.unwrap_infallible();
assert_eq!(decoded, message);
assert!(ans_coder.is_empty());
// The `encoder_model` assigns zero probability to any symbols that were not provided to its
// constructor, so trying to encode a message that contains such a symbol will fail.
assert!(ans_coder.encode_iid_symbols_reverse("Mix".chars(), &encoder_model).is_err())
// ERROR: symbol 'x' is not in the support of `encoder_model`.
§When Should I Use This Type of Entropy Model?
Use a NonContiguousCategoricalEncoderModel
for probabilistic models that can only be
represented as an explicit probability table, and not by some more compact analytic
expression.
Use a NonContiguousCategoricalDecoderModel
for probabilistic models that can only be
represented as an explicit probability table, and not by some more compact analytic
expression.
- If you have a probability model that can be expressed by some analytical expression
(e.g., a
Binomial
distribution), then useLeakyQuantizer
instead (unless you want to encode lots of symbols with the same entropy model, in which case the explicitly tabulated representation of a categorical entropy model could improve runtime performance). - If the support of your probabilistic model (i.e., the set of symbols to which the
model assigns a non-zero probability) is a contiguous range of integers starting at
zero, then it is better to use a
ContiguousCategoricalEntropyModel
. It has better computational efficiency and it is easier to use since it supports both encoding and decoding with a single type. - If you want to encode only a few symbols with a given probability model, then use a
LazyContiguousCategoricalEntropyModel
, which will be faster (useHashMap
to first map from your noncontiguous support to indices in a contiguous range0..N
, whereN
is the size of your support). This use case occurs, e.g., in autoregressive models, where each individual model is often used for only exactly one symbol.
§Computational Efficiency
For a probability distribution with a support of N
symbols, a
NonContiguousCategoricalEncoderModel
has the following asymptotic costs:
- creation:
- runtime cost:
Θ(N log(N))
(when creating with the..._fast
constructor); - memory footprint:
Θ(N)
;
- runtime cost:
- encoding a symbol (calling
EncoderModel::left_cumulative_and_probability
):- expected runtime cost:
Θ(1)
(worst case can be more expensive, uses aHashMap
under the hood). - memory footprint: no heap allocations, constant stack space.
- expected runtime cost:
- decoding a symbol: not supported; use a
NonContiguousCategoricalDecoderModel
.
Implementations§
Source§impl<Symbol, Probability, const PRECISION: usize> NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
impl<Symbol, Probability, const PRECISION: usize> NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
Sourcepub fn from_symbols_and_floating_point_probabilities_fast<F>(
symbols: impl IntoIterator<Item = Symbol>,
probabilities: &[F],
normalization: Option<F>,
) -> Result<Self, ()>where
F: FloatCore + Sum<F> + AsPrimitive<Probability>,
Probability: AsPrimitive<usize>,
usize: AsPrimitive<Probability> + AsPrimitive<F>,
pub fn from_symbols_and_floating_point_probabilities_fast<F>(
symbols: impl IntoIterator<Item = Symbol>,
probabilities: &[F],
normalization: Option<F>,
) -> Result<Self, ()>where
F: FloatCore + Sum<F> + AsPrimitive<Probability>,
Probability: AsPrimitive<usize>,
usize: AsPrimitive<Probability> + AsPrimitive<F>,
Constructs a leaky distribution (for encoding) over the provided symbols
whose PMF
approximates given probabilities
.
Semantics are analogous to
ContiguousCategoricalEntropyModel::from_floating_point_probabilities_fast
,
except that this constructor has an additional symbols
argument to provide an
iterator over the symbols in the alphabet (which has to yield exactly
probabilities.len()
symbols).
§See also
from_symbols_and_floating_point_probabilities_perfect
, which can be considerably slower but typically approximates the providedprobabilities
very slightly better.
Sourcepub fn from_symbols_and_floating_point_probabilities_perfect<F>(
symbols: impl IntoIterator<Item = Symbol>,
probabilities: &[F],
) -> Result<Self, ()>where
F: FloatCore + Sum<F> + Into<f64>,
Probability: Into<f64> + AsPrimitive<usize>,
f64: AsPrimitive<Probability>,
usize: AsPrimitive<Probability>,
pub fn from_symbols_and_floating_point_probabilities_perfect<F>(
symbols: impl IntoIterator<Item = Symbol>,
probabilities: &[F],
) -> Result<Self, ()>where
F: FloatCore + Sum<F> + Into<f64>,
Probability: Into<f64> + AsPrimitive<usize>,
f64: AsPrimitive<Probability>,
usize: AsPrimitive<Probability>,
Slower variant of from_symbols_and_floating_point_probabilities_fast
.
Similar to from_symbols_and_floating_point_probabilities_fast
, but the resulting
(fixed-point precision) model typically approximates the provided floating point
probabilities
very slightly better. Only recommended if compression performance
is much more important to you than runtime as this constructor can be
significantly slower.
See ContiguousCategoricalEntropyModel::from_floating_point_probabilities_perfect
for a detailed comparison between ..._fast
and ..._perfect
constructors of
categorical entropy models.
Sourcepub fn from_symbols_and_floating_point_probabilities<F>(
symbols: impl IntoIterator<Item = Symbol>,
probabilities: &[F],
) -> Result<Self, ()>where
F: FloatCore + Sum<F> + Into<f64>,
Probability: Into<f64> + AsPrimitive<usize>,
f64: AsPrimitive<Probability>,
usize: AsPrimitive<Probability>,
👎Deprecated since 0.4.0: Please use from_symbols_and_floating_point_probabilities_fast
or from_symbols_and_floating_point_probabilities_perfect
instead. See documentation for detailed upgrade instructions.
pub fn from_symbols_and_floating_point_probabilities<F>(
symbols: impl IntoIterator<Item = Symbol>,
probabilities: &[F],
) -> Result<Self, ()>where
F: FloatCore + Sum<F> + Into<f64>,
Probability: Into<f64> + AsPrimitive<usize>,
f64: AsPrimitive<Probability>,
usize: AsPrimitive<Probability>,
from_symbols_and_floating_point_probabilities_fast
or from_symbols_and_floating_point_probabilities_perfect
instead. See documentation for detailed upgrade instructions.Deprecated constructor.
This constructor has been deprecated in constriction version 0.4.0, and it will be removed in constriction version 0.5.0.
§Upgrade Instructions
Most new use cases should call
from_symbols_and_floating_point_probabilities_fast
instead. Using that
constructor (abbreviated as ..._fast
in the following) may lead to very slightly
larger bit rates, but it runs considerably faster.
However, note that the ..._fast
constructor breaks binary compatibility with
constriction
version <= 0.3.5. If you need to be able to exchange binary
compressed data with a program that uses a categorical entropy model from
constriction
version <= 0.3.5, then call
from_symbols_and_floating_point_probabilities_perfect
instead (..._perfect
for
short). Another reason for using the ..._perfect
constructor could be if
compression performance is much more important to you than runtime performance.
See documentation of from_symbols_and_floating_point_probabilities_perfect
for
more information.
§Compatibility Table
(In the following table, “encoding” refers to
NonContiguousCategoricalDecoderModel
)
constructor used for encoding → ↓ constructor used for decoding ↓ | legacy (this one) | ..._perfect | ..._fast |
---|---|---|---|
legacy | ✅ compatible | ✅ compatible | ❌ incompatible |
..._perfect | ✅ compatible | ✅ compatible | ❌ incompatible |
..._fast | ❌ incompatible | ❌ incompatible | ✅ compatible |
Sourcepub fn from_symbols_and_nonzero_fixed_point_probabilities<S, P>(
symbols: S,
probabilities: P,
infer_last_probability: bool,
) -> Result<Self, ()>
pub fn from_symbols_and_nonzero_fixed_point_probabilities<S, P>( symbols: S, probabilities: P, infer_last_probability: bool, ) -> Result<Self, ()>
Constructs a distribution with a PMF given in fixed point arithmetic.
This method operates logically identically to
NonContiguousCategoricalDecoderModel::from_symbols_and_nonzero_fixed_point_probabilities
except that it constructs an EncoderModel
rather than a DecoderModel
.
Sourcepub fn from_iterable_entropy_model<'m, M>(model: &'m M) -> Selfwhere
M: IterableEntropyModel<'m, PRECISION, Symbol = Symbol, Probability = Probability> + ?Sized,
pub fn from_iterable_entropy_model<'m, M>(model: &'m M) -> Selfwhere
M: IterableEntropyModel<'m, PRECISION, Symbol = Symbol, Probability = Probability> + ?Sized,
Creates a NonContiguousCategoricalEncoderModel
from any entropy model that
implements IterableEntropyModel
.
Calling NonContiguousCategoricalEncoderModel::from_iterable_entropy_model(&model)
is equivalent to calling model.to_generic_encoder_model()
, where the latter
requires bringing IterableEntropyModel
into scope.
Sourcepub fn support_size(&self) -> usize
pub fn support_size(&self) -> usize
Returns the number of symbols in the support of the model.
The support of the model is the set of all symbols that have nonzero probability.
Sourcepub fn entropy_base2<F>(&self) -> F
pub fn entropy_base2<F>(&self) -> F
Returns the entropy in units of bits (i.e., base 2).
Similar to IterableEntropyModel::entropy_base2
, except that
- this type doesn’t implement
IterableEntropyModel
because it doesn’t store entries in a stable expected order; - because the order in which entries are stored will generally be different on each program execution, rounding errors will be slightly different across multiple program executions.
Trait Implementations§
Source§impl<Symbol, Probability, const PRECISION: usize> Clone for NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
impl<Symbol, Probability, const PRECISION: usize> Clone for NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
Source§fn clone(
&self,
) -> NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
fn clone( &self, ) -> NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read more