pub struct NonContiguousCategoricalEncoderModel<Symbol, Probability, const PRECISION: usize>{ /* private fields */ }
Expand description
An entropy model for a categorical probability distribution over arbitrary symbols, for encoding only.
You will usually want to use this type through one of its type aliases,
DefaultNonContiguousCategoricalEncoderModel
or
SmallNonContiguousCategoricalEncoderModel
, see discussion of
presets.
This type implements the trait EncoderModel
but not the trait DecoderModel
.
Thus, you can use a NonContiguousCategoricalEncoderModel
for encoding with any of
the stream encoders provided by the constriction
crate, but not for decoding. If you
want to decode data, use a NonContiguousCategoricalDecoderModel
instead.
Example
use constriction::{
stream::{stack::DefaultAnsCoder, Decode},
stream::model::DefaultNonContiguousCategoricalEncoderModel,
stream::model::DefaultNonContiguousCategoricalDecoderModel,
UnwrapInfallible,
};
// Create a `ContiguousCategoricalEntropyModel` that approximates floating point probabilities.
let alphabet = ['M', 'i', 's', 'p', '!'];
let probabilities = [0.09, 0.36, 0.36, 0.18, 0.0];
let encoder_model = DefaultNonContiguousCategoricalEncoderModel
::from_symbols_and_floating_point_probabilities(alphabet.iter().cloned(), &probabilities)
.unwrap();
assert_eq!(encoder_model.support_size(), 5); // `encoder_model` supports 4 symbols.
// Use `encoder_model` for entropy coding.
let message = "Mississippi!";
let mut ans_coder = DefaultAnsCoder::new();
ans_coder.encode_iid_symbols_reverse(message.chars(), &encoder_model).unwrap();
// Note that `message` contains the symbol '!', which has zero probability under our
// floating-point model. However, we can still encode the symbol because the
// `NonContiguousCategoricalEntropyModel` is "leaky", i.e., it assigns a nonzero
// probability to all symbols that we provided to the constructor.
// Create a matching `decoder_model`, decode the encoded message, and verify correctness.
let decoder_model = DefaultNonContiguousCategoricalDecoderModel
::from_symbols_and_floating_point_probabilities(&alphabet, &probabilities)
.unwrap();
// We could pass `decoder_model` by reference (like we did for `encoder_model` above) but
// passing `decoder_model.as_view()` is slightly more efficient.
let decoded = ans_coder
.decode_iid_symbols(12, decoder_model.as_view())
.collect::<Result<String, _>>()
.unwrap_infallible();
assert_eq!(decoded, message);
assert!(ans_coder.is_empty());
// The `encoder_model` assigns zero probability to any symbols that were not provided to its
// constructor, so trying to encode a message that contains such a symbol will fail.
assert!(ans_coder.encode_iid_symbols_reverse("Mix".chars(), &encoder_model).is_err())
// ERROR: symbol 'x' is not in the support of `encoder_model`.
When Should I Use This Type of Entropy Model?
Use a NonContiguousCategoricalEncoderModel
for probabilistic models that can only be
represented as an explicit probability table, and not by some more compact analytic
expression. If you have a probability model that can be expressed by some analytical
expression (e.g., a Binomial
distribution),
then use LeakyQuantizer
instead (unless you want to encode lots of symbols with the
same entropy model, in which case the explicitly tabulated representation of a
categorical entropy model could improve runtime performance).
Further, if the support of your probabilistic model (i.e., the set of symbols to which
the model assigns a non-zero probability) is a contiguous range of integers starting at
zero, then it is better to use a ContiguousCategoricalEntropyModel
. It has better
computational efficiency and it is easier to use since it supports both encoding and
decoding with a single type.
Computational Efficiency
For a probability distribution with a support of N
symbols, a
NonContiguousCategoricalEncoderModel
has the following asymptotic costs:
- creation:
- runtime cost:
Θ(N)
when creating from fixed point probabilities,Θ(N log(N))
when creating from floating point probabilities; - memory footprint:
Θ(N)
; - both are more expensive by a constant factor than for a
ContiguousCategoricalEntropyModel
.
- runtime cost:
- encoding a symbol (calling
EncoderModel::left_cumulative_and_probability
):- expected runtime cost:
Θ(1)
(worst case can be more expensive, uses aHashMap
under the hood). - memory footprint: no heap allocations, constant stack space.
- expected runtime cost:
- decoding a symbol: not supported; use a
NonContiguousCategoricalDecoderModel
.
Implementations§
source§impl<Symbol, Probability, const PRECISION: usize> NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
impl<Symbol, Probability, const PRECISION: usize> NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
sourcepub fn from_symbols_and_floating_point_probabilities<F>(
symbols: impl IntoIterator<Item = Symbol>,
probabilities: &[F]
) -> Result<Self, ()>where
F: FloatCore + Sum<F> + Into<f64>,
Probability: Into<f64> + AsPrimitive<usize>,
f64: AsPrimitive<Probability>,
usize: AsPrimitive<Probability>,
pub fn from_symbols_and_floating_point_probabilities<F>(
symbols: impl IntoIterator<Item = Symbol>,
probabilities: &[F]
) -> Result<Self, ()>where
F: FloatCore + Sum<F> + Into<f64>,
Probability: Into<f64> + AsPrimitive<usize>,
f64: AsPrimitive<Probability>,
usize: AsPrimitive<Probability>,
Constructs a leaky distribution over the provided symbols
whose PMF approximates
given probabilities
.
This method operates logically identically to
NonContiguousCategoricalDecoderModel::from_symbols_and_floating_point_probabilities
except that it constructs an EncoderModel
rather than a DecoderModel
.
sourcepub fn from_symbols_and_nonzero_fixed_point_probabilities<S, P>(
symbols: S,
probabilities: P,
infer_last_probability: bool
) -> Result<Self, ()>
pub fn from_symbols_and_nonzero_fixed_point_probabilities<S, P>( symbols: S, probabilities: P, infer_last_probability: bool ) -> Result<Self, ()>
Constructs a distribution with a PMF given in fixed point arithmetic.
This method operates logically identically to
NonContiguousCategoricalDecoderModel::from_symbols_and_nonzero_fixed_point_probabilities
except that it constructs an EncoderModel
rather than a DecoderModel
.
sourcepub fn from_iterable_entropy_model<'m, M>(model: &'m M) -> Selfwhere
M: IterableEntropyModel<'m, PRECISION, Symbol = Symbol, Probability = Probability> + ?Sized,
pub fn from_iterable_entropy_model<'m, M>(model: &'m M) -> Selfwhere
M: IterableEntropyModel<'m, PRECISION, Symbol = Symbol, Probability = Probability> + ?Sized,
Creates a NonContiguousCategoricalEncoderModel
from any entropy model that
implements IterableEntropyModel
.
Calling NonContiguousCategoricalEncoderModel::from_iterable_entropy_model(&model)
is equivalent to calling model.to_generic_encoder_model()
, where the latter
requires bringing IterableEntropyModel
into scope.
TODO: test
sourcepub fn support_size(&self) -> usize
pub fn support_size(&self) -> usize
Returns the number of symbols in the support of the model.
The support of the model is the set of all symbols that have nonzero probability.
sourcepub fn entropy_base2<F>(&self) -> F
pub fn entropy_base2<F>(&self) -> F
Returns the entropy in units of bits (i.e., base 2).
Similar to IterableEntropyModel::entropy_base2
, except that
- this type doesn’t implement
IterableEntropyModel
because it doesn’t store entries in a stable expected order; - because the order in which entries are stored will generally be different on each program execution, rounding errors will be slightly different across multiple program executions.
Trait Implementations§
source§impl<Symbol, Probability, const PRECISION: usize> Clone for NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
impl<Symbol, Probability, const PRECISION: usize> Clone for NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
source§fn clone(
&self
) -> NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
fn clone( &self ) -> NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read more