Struct NonContiguousCategoricalEncoderModel

Source
pub struct NonContiguousCategoricalEncoderModel<Symbol, Probability, const PRECISION: usize>
where Symbol: Hash, Probability: BitArray,
{ /* private fields */ }
Expand description

An entropy model for a categorical probability distribution over arbitrary symbols, for encoding only.

You will usually want to use this type through one of its type aliases, DefaultNonContiguousCategoricalEncoderModel or SmallNonContiguousCategoricalEncoderModel, see discussion of presets.

This type implements the trait EncoderModel but not the trait DecoderModel. Thus, you can use a NonContiguousCategoricalEncoderModel for encoding with any of the stream encoders provided by the constriction crate, but not for decoding. If you want to decode data, use a NonContiguousCategoricalDecoderModel instead.

§Example

use constriction::{
    stream::{stack::DefaultAnsCoder, Decode},
    stream::model::DefaultNonContiguousCategoricalEncoderModel,
    stream::model::DefaultNonContiguousCategoricalDecoderModel,
    UnwrapInfallible,
};

// Create a `ContiguousCategoricalEntropyModel` that approximates floating point probabilities.
let alphabet = ['M', 'i', 's', 'p', '!'];
let probabilities = [0.09, 0.36, 0.36, 0.18, 0.0];
let encoder_model = DefaultNonContiguousCategoricalEncoderModel
    ::from_symbols_and_floating_point_probabilities_fast(
        alphabet.iter().cloned(),
        &probabilities,
        None
    )
    .unwrap();
assert_eq!(encoder_model.support_size(), 5); // `encoder_model` supports 4 symbols.

// Use `encoder_model` for entropy coding.
let message = "Mississippi!";
let mut ans_coder = DefaultAnsCoder::new();
ans_coder.encode_iid_symbols_reverse(message.chars(), &encoder_model).unwrap();
// Note that `message` contains the symbol '!', which has zero probability under our
// floating-point model. However, we can still encode the symbol because the
// `NonContiguousCategoricalEntropyModel` is "leaky", i.e., it assigns a nonzero
// probability to all symbols that we provided to the constructor.

// Create a matching `decoder_model`, decode the encoded message, and verify correctness.
let decoder_model = DefaultNonContiguousCategoricalDecoderModel
    ::from_symbols_and_floating_point_probabilities_fast(
        &alphabet, &probabilities, None
    )
    .unwrap();

// We could pass `decoder_model` by reference (like we did for `encoder_model` above) but
// passing `decoder_model.as_view()` is slightly more efficient.
let decoded = ans_coder
    .decode_iid_symbols(12, decoder_model.as_view())
    .collect::<Result<String, _>>()
    .unwrap_infallible();
assert_eq!(decoded, message);
assert!(ans_coder.is_empty());

// The `encoder_model` assigns zero probability to any symbols that were not provided to its
// constructor, so trying to encode a message that contains such a symbol will fail.
assert!(ans_coder.encode_iid_symbols_reverse("Mix".chars(), &encoder_model).is_err())
// ERROR: symbol 'x' is not in the support of `encoder_model`.

§When Should I Use This Type of Entropy Model?

Use a NonContiguousCategoricalEncoderModel for probabilistic models that can only be represented as an explicit probability table, and not by some more compact analytic expression.

Use a NonContiguousCategoricalDecoderModel for probabilistic models that can only be represented as an explicit probability table, and not by some more compact analytic expression.

  • If you have a probability model that can be expressed by some analytical expression (e.g., a Binomial distribution), then use LeakyQuantizer instead (unless you want to encode lots of symbols with the same entropy model, in which case the explicitly tabulated representation of a categorical entropy model could improve runtime performance).
  • If the support of your probabilistic model (i.e., the set of symbols to which the model assigns a non-zero probability) is a contiguous range of integers starting at zero, then it is better to use a ContiguousCategoricalEntropyModel. It has better computational efficiency and it is easier to use since it supports both encoding and decoding with a single type.
  • If you want to encode only a few symbols with a given probability model, then use a LazyContiguousCategoricalEntropyModel, which will be faster (use HashMap to first map from your noncontiguous support to indices in a contiguous range 0..N, where N is the size of your support). This use case occurs, e.g., in autoregressive models, where each individual model is often used for only exactly one symbol.

§Computational Efficiency

For a probability distribution with a support of N symbols, a NonContiguousCategoricalEncoderModel has the following asymptotic costs:

Implementations§

Source§

impl<Symbol, Probability, const PRECISION: usize> NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
where Symbol: Hash + Eq, Probability: BitArray,

Source

pub fn from_symbols_and_floating_point_probabilities_fast<F>( symbols: impl IntoIterator<Item = Symbol>, probabilities: &[F], normalization: Option<F>, ) -> Result<Self, ()>
where F: FloatCore + Sum<F> + AsPrimitive<Probability>, Probability: AsPrimitive<usize>, usize: AsPrimitive<Probability> + AsPrimitive<F>,

Constructs a leaky distribution (for encoding) over the provided symbols whose PMF approximates given probabilities.

Semantics are analogous to ContiguousCategoricalEntropyModel::from_floating_point_probabilities_fast, except that this constructor has an additional symbols argument to provide an iterator over the symbols in the alphabet (which has to yield exactly probabilities.len() symbols).

§See also
Source

pub fn from_symbols_and_floating_point_probabilities_perfect<F>( symbols: impl IntoIterator<Item = Symbol>, probabilities: &[F], ) -> Result<Self, ()>
where F: FloatCore + Sum<F> + Into<f64>, Probability: Into<f64> + AsPrimitive<usize>, f64: AsPrimitive<Probability>, usize: AsPrimitive<Probability>,

Slower variant of from_symbols_and_floating_point_probabilities_fast.

Similar to from_symbols_and_floating_point_probabilities_fast, but the resulting (fixed-point precision) model typically approximates the provided floating point probabilities very slightly better. Only recommended if compression performance is much more important to you than runtime as this constructor can be significantly slower.

See ContiguousCategoricalEntropyModel::from_floating_point_probabilities_perfect for a detailed comparison between ..._fast and ..._perfect constructors of categorical entropy models.

Source

pub fn from_symbols_and_floating_point_probabilities<F>( symbols: impl IntoIterator<Item = Symbol>, probabilities: &[F], ) -> Result<Self, ()>
where F: FloatCore + Sum<F> + Into<f64>, Probability: Into<f64> + AsPrimitive<usize>, f64: AsPrimitive<Probability>, usize: AsPrimitive<Probability>,

👎Deprecated since 0.4.0: Please use from_symbols_and_floating_point_probabilities_fast or from_symbols_and_floating_point_probabilities_perfect instead. See documentation for detailed upgrade instructions.

Deprecated constructor.

This constructor has been deprecated in constriction version 0.4.0, and it will be removed in constriction version 0.5.0.

§Upgrade Instructions

Most new use cases should call from_symbols_and_floating_point_probabilities_fast instead. Using that constructor (abbreviated as ..._fast in the following) may lead to very slightly larger bit rates, but it runs considerably faster.

However, note that the ..._fast constructor breaks binary compatibility with constriction version <= 0.3.5. If you need to be able to exchange binary compressed data with a program that uses a categorical entropy model from constriction version <= 0.3.5, then call from_symbols_and_floating_point_probabilities_perfect instead (..._perfect for short). Another reason for using the ..._perfect constructor could be if compression performance is much more important to you than runtime performance. See documentation of from_symbols_and_floating_point_probabilities_perfect for more information.

§Compatibility Table

(In the following table, “encoding” refers to NonContiguousCategoricalDecoderModel)

constructor used for encoding →
↓ constructor used for decoding ↓
legacy
(this one)
..._perfect..._fast
legacy✅ compatible✅ compatible❌ incompatible
..._perfect✅ compatible✅ compatible❌ incompatible
..._fast❌ incompatible❌ incompatible✅ compatible
Source

pub fn from_symbols_and_nonzero_fixed_point_probabilities<S, P>( symbols: S, probabilities: P, infer_last_probability: bool, ) -> Result<Self, ()>
where S: IntoIterator<Item = Symbol>, P: IntoIterator, P::Item: Borrow<Probability>,

Constructs a distribution with a PMF given in fixed point arithmetic.

This method operates logically identically to NonContiguousCategoricalDecoderModel::from_symbols_and_nonzero_fixed_point_probabilities except that it constructs an EncoderModel rather than a DecoderModel.

Source

pub fn from_iterable_entropy_model<'m, M>(model: &'m M) -> Self
where M: IterableEntropyModel<'m, PRECISION, Symbol = Symbol, Probability = Probability> + ?Sized,

Creates a NonContiguousCategoricalEncoderModel from any entropy model that implements IterableEntropyModel.

Calling NonContiguousCategoricalEncoderModel::from_iterable_entropy_model(&model) is equivalent to calling model.to_generic_encoder_model(), where the latter requires bringing IterableEntropyModel into scope.

Source

pub fn support_size(&self) -> usize

Returns the number of symbols in the support of the model.

The support of the model is the set of all symbols that have nonzero probability.

Source

pub fn entropy_base2<F>(&self) -> F
where F: Float + Sum, Probability: Into<F>,

Returns the entropy in units of bits (i.e., base 2).

Similar to IterableEntropyModel::entropy_base2, except that

  • this type doesn’t implement IterableEntropyModel because it doesn’t store entries in a stable expected order;
  • because the order in which entries are stored will generally be different on each program execution, rounding errors will be slightly different across multiple program executions.

Trait Implementations§

Source§

impl<Symbol, Probability, const PRECISION: usize> Clone for NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
where Symbol: Hash + Clone, Probability: BitArray + Clone, Probability::NonZero: Clone,

Source§

fn clone( &self, ) -> NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>

Returns a copy of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<Symbol, Probability, const PRECISION: usize> Debug for NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
where Symbol: Hash + Debug, Probability: BitArray + Debug, Probability::NonZero: Debug,

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<Symbol, Probability, const PRECISION: usize> EncoderModel<PRECISION> for NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
where Symbol: Hash + Eq, Probability: BitArray,

Source§

fn left_cumulative_and_probability( &self, symbol: impl Borrow<Self::Symbol>, ) -> Option<(Self::Probability, Probability::NonZero)>

Looks up a symbol in the entropy model. Read more
Source§

fn floating_point_probability<F>(&self, symbol: Self::Symbol) -> F
where F: FloatCore, Self::Probability: Into<F>,

Returns the probability of the given symbol in floating point representation. Read more
Source§

impl<Symbol, Probability, const PRECISION: usize> EntropyModel<PRECISION> for NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
where Symbol: Hash, Probability: BitArray,

Source§

type Probability = Probability

The type used to represent probabilities, cumulatives, and quantiles. Read more
Source§

type Symbol = Symbol

The type of data over which the entropy model is defined. Read more
Source§

impl<'m, Symbol, Probability, M, const PRECISION: usize> From<&'m M> for NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
where Symbol: Hash + Eq, Probability: BitArray, M: IterableEntropyModel<'m, PRECISION, Symbol = Symbol, Probability = Probability> + ?Sized,

Source§

fn from(model: &'m M) -> Self

Converts to this type from the input type.

Auto Trait Implementations§

§

impl<Symbol, Probability, const PRECISION: usize> Freeze for NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>

§

impl<Symbol, Probability, const PRECISION: usize> RefUnwindSafe for NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
where Symbol: RefUnwindSafe, Probability: RefUnwindSafe, <Probability as BitArray>::NonZero: RefUnwindSafe,

§

impl<Symbol, Probability, const PRECISION: usize> Send for NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
where Symbol: Send, Probability: Send, <Probability as BitArray>::NonZero: Send,

§

impl<Symbol, Probability, const PRECISION: usize> Sync for NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
where Symbol: Sync, Probability: Sync, <Probability as BitArray>::NonZero: Sync,

§

impl<Symbol, Probability, const PRECISION: usize> Unpin for NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
where Symbol: Unpin, Probability: Unpin, <Probability as BitArray>::NonZero: Unpin,

§

impl<Symbol, Probability, const PRECISION: usize> UnwindSafe for NonContiguousCategoricalEncoderModel<Symbol, Probability, PRECISION>
where Symbol: UnwindSafe, Probability: UnwindSafe, <Probability as BitArray>::NonZero: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.