Module constriction::stream
source · Expand description
Stream Codes (entropy codes that amortize over several symbols)
Module Overview
This module provides implementations of stream codes and utilities for defining entropy
models for these stream codes. This module is the heart of the constriction
crate.
Provided Implementations of Stream Codes
Currently, the following stream codes are provided (see below for a comparison):
- Asymmetric Numeral Systems (ANS): a highly efficient modern stream code with
near-optimal compression effectiveness; ANS Coding operates as a stack (“last in first
out”) and is surjective, which enables advanced use cases like bits-back coding in
hierarchical entropy models. See submodule
stack
. - Range Coding: a modern variant of Arithmetic Coding; it has essentially the same
compression effectiveness as ANS Coding but operates as a queue (“first in first
out”), which makes it preferable for autoregressive entropy models but less useful for
hierarchical models (see comparison). See submodule
queue
. - Chain Coding: an experimental new entropy coder that combines the (net)
effectiveness of stream codes with the locality of symbol codes; it is meant for
experimental new compression approaches that perform joint inference, quantization,
and bits-back coding in an end-to-end optimization. See submodule
chain
.
All of these stream codes are provided through types that implement the Encode
and
Decode
traits defined in this module.
Provided Utilities for Entropy Models
To encode or decode a sequence of symbols with one of the above stream codes, you have
to specify an EntropyModel
for each symbol. The submodule model
provides
utilities for defining EntropyModel
s.
Examples
See submodules stack
and queue
.
Which Stream Code Should I Use?
The list below shows a detailed comparison of the major two provided stream codes, ANS Coding and Range Coding (most users won’t want to use the Chain Coder).
TL;DR: Don’t overthink it. For many practical use cases, both ANS and Range Coding will probably do the job, and if you end up painting yourself into a corner then you can still switch between the methods later relatively easily since most operations are implemented in trait methods. If you want to use the bits-back trick or similar advanced techniques, then definitely use the ANS Coder because the provided Range Coder isn’t surjective (see below). If you have an autoregressive entropy model then you might prefer Range Coding. In any other case, a possibly biased recommendation from the author of this paragraph is to use ANS Coding by default for its simplicity and decoding speed.
Here’s the more detailed comparison between ANS Coding and Chain Coding:
- Read/write semantics: The main practical difference between ANS and Range Coding
is that ANS Coding operates as a stack (“last in first out”) whereas Range Coding
operates as a queue (“first in first out”). Whether a stack or a queue is better for
you depends on your entropy model:
- Autoregressive entropy models: point for Range Coding. If you use an ANS coder
to encode a sequence of symbols then it is advisable to use the method
AnsCoder::encode_symbols_reverse
. As the name suggests, this method encodes the symbols onto the stack in reverse order so that, when you decode them at a later time, you’ll recover them in original order. However, iterating over symbols and, in particular, over their associated entropy models in reverse order may be inconvenient for autoregressive models with long term memory. Using an ANS coder with such a model means that you may end up having to first build up the full autoregressive model in memory front-to-back and only then iterate over it back-to-front for encoding (the decoder can still iterate as normal, front-to-back). Range Coding makes this easier since both encoding and decoding can iterate in the natural front-to-back direction. - Hierarchical entropy models: point for ANS Coding: Sometimes a stack (as in ANS
Coding) is just what you need because having only a single “open end” at the top of
the stack makes things easier. By contrast, a queue (as in Range Coding) has two
separate open ends for reading and writing, respectively. This leads to a number of
corner cases when the two ends come close to each other (which can happen either
within a single word or on two neighboring words). To avoid the computational
overhead for checking for these corner cases on every encoded or decoded symbol, the
provided Range Coder strictly distinguishes between a
RangeEncoder
and aRangeDecoder
type. Switching from aRangeEncoder
to aRangeDecoder
requires “sealing” the write end and “opening up” the read end, both of which are irreversible operations, i.e., you can’t switch back from aRangeDecoder
with partially consumed data to aRangeEncoder
to continue writing to it. By contrast, the provided ANS Coder just has a singleAnsCoder
type that supports arbitrary sequences of (possibly interleaved) read and write operations. Being able to interleave reads and writes (i.e., decoding and encoding operations) is important for bits-back coding, a useful method for hierarchical entropy models.
- Autoregressive entropy models: point for Range Coding. If you use an ANS coder
to encode a sequence of symbols then it is advisable to use the method
- Surjectivity: point for ANS Coding. This is another property that is important for
bits-back coding, and that only ANS Coding satisfies: encoding data with an ANS Coder
and with a given sequence of entropy models can lead to any arbitrary sequence of
words of compressed data. This means that decoding with an ANS Coder is invertible,
even if the binary data that you decode is arbitrary. You can load any given binary
data into an ANS Coder (with
AnsCoder::from_binary
) and then decode some part of the data into a sequence of symbols. Regardless of the source of the binary data, re-encoding the symbols will restore it. By contrast, Range Coding is not surjective, and decoding with the provided Range Coder is not always invertible. Due to rounding, Range Coding has some (small) pockets of unrealizable bit strings for any given sequence of entropy models. While the Range Decoder inconstriction
is currently lenient in this regard (it still outputs valid symbols when decoding an unrealizable bit string), re-encoding the symbols you obtain from decoding an unrealizable bit string will not restore the original (unrealizable) bit string. This means that the provided Range Coder cannot be used for bits-back coding in a reliable way. - Compression effectiveness: virtually no difference unless misused. With the
provided presets), the bitrates of both ANS Coding and Range Coding are
both very close to the theoretical bound for lossless compression. The lack of
surjectivity in Range Coding leads to a small overhead, but it seems to be roughly
countered by a slightly larger rounding overhead in ANS Coding. Keep in mind, however,
that deviating from the presets can
considerably degrade compression effectiveness if not done carefully. It is strongly
recommended to always start with the
DefaultXxx
type aliases (for either ANS or Range Coding), which use well-tested presets, before experimenting with deviations from these defaults. - Computational efficiency: point for ANS Coding. ANS Coding is a simpler algorithm
than Range Coding, especially the decoding part. This manifests itself in fewer
branches and a smaller internal coder state. Empirically, our decoding benchmarks in
the file
benches/lookup.rs
run more than twice as fast with anAnsCoder
than with aRangeDecoder
. However, please note that (i) these benchmarks use the highly optimized lookup models; if you use other entropy models then these will likely be the computational bottleneck, not the coder; (ii) future versions ofconstriction
may introduce further run-time optimizations; and (iii) while decoding is more than two times faster with ANS, encoding is somewhat (~ 10 %) faster with Range Coding (this might be because encoding with ANS, unlike decoding, involves an integer division, which is a surprisingly slow operation on most hardware). - Random access decoding: minor point for ANS Coding. Both ANS and Range Coding
support decoding with random access via the
Seek
trait, but it comes at different costs. In order to jump (“seek
”) to a given location in a slice of compressed data, you have to provide an index for a position in the slice and an internal coder state (you can get both during encoding viaPos::pos
). While the type of the internal coder state can be controlled through type parameters, its minimal (and also default) size is only two words for ANS Coding but four words for Range Coding. Thus, if you’re developing a container file format that contains a jump table for random access decoding, then choosing ANS Coding over Range Coding will allow you to reduce the memory footprint of the jump table. Whether or not this is significant depends on how fine grained your jump table is. - Serialization: minor point for ANS Coding. The state of an ANS Coder is entirely described by the compressed bit string. This means that you can interrupt encoding and/or decoding with an ANS Coder at any time, serialize the ANS Coder to a bit string, recover it at a later time by deserialization, and continue encoding and/or decoding. The serialized form of the ANS Coder is simply the string of compressed bits that the Coder has accumulated so far. By contrast, a Range Encoder has some additional internal state that is not needed for decoding and therefore not part of the compressed bit string, but that you would need if you wanted to continue to append more symbols to a serialized-deserialized Range Encoder.
What’s a Stream Code?
A stream code is an entropy coding algorithm that encodes a sequence of symbols into a compressed bit string, one symbol at a time, in a way that amortizes compressed bits over several symbols. This amortization allows stream codes to reach near-optimal compression performance without giving up computational efficiency. In our experiments, both the provided ANS Coder and the Range Coder implementations showed about 0.1 % relative overhead over the theoretical bound on the bitrate (using “default” presets).
The near-optimal compression performance of stream codes is to be seen in contrast to
symbol codes (see module symbol
), such as the well-known Huffman
code. Symbol codes do not amortize over symbols.
Instead, they map each symbol to a fixed sequence of bits of integer length (a
“codeword”). This leads to a typical overhead of 0.5 bits per symbol in the best
case, and to an overhead of almost 1 bit per symbol for entropy models with very
low (≪ 1 bit of) entropy per symbol, which is common for deep learning based
entropy models. Stream codes do not suffer from this overhead.
The computational efficiency of stream codes is to be seen in contrast to block codes. Block codes are symbol codes that operate on blocks of several consecutive symbols at once, which reduces the relative impact of the constant overhead of ~0.5 bits per block compared to regular symbol codes. Block codes typically require some method of constructing the code books dynamically (e.g., Deflate) to avoid the computational cost from growing exponentially in the block size. By contrast, stream codes amortize over long sequences of symbols without ever constructing an explicit code book for the entire sequence of symbols. This keeps their computational cost linear in the number of symbols.
Highly Customizable Implementations With Sane Presets
Users who need precise control over the trade-off between compression effectiveness, runtime performance, and memory consumption can fine tune the provided implementations of stream codes and entropy models at compile time through type parameters discussed below. For most users, however, we recommend using one of the following presets.
Presets
As finding optimal settings can be tricky, we provide the following type aliases for useful presets:
- “Default” presets are the recommended starting point for most users, and they are
also used by
constriction
’s Python API. The “default” presets provide very near-optimal compression effectiveness for most conceivable applications and high runtime performance on typical (64 bit) desktop computers. However, the “default” presets are not recommended for aLookupDecoderModel
as their high numerical precision would lead to enormeous lookup tables (~ 67 MB), which would take a considerable time to build and likely leead to extremely poor cashing.- entropy coders with “default” presets:
DefaultAnsCoder
,DefaultRangeEncoder
,DefaultRangeDecoder
, andDefaultChainCoder
; - entropy models with “default” presets:
DefaultLeakyQuantizer
,DefaultContiguousCategoricalEntropyModel
,DefaultNonContiguousCategoricalEncoderModel
, andDefaultNonContiguousCategoricalDecoderModel
.
- entropy coders with “default” presets:
- “Small” presets may be considered when optimizing a final product for runtime
efficiency and memory consumption. The “small” presets use a lower numerical precision
and a smaller state and word size than the “default” presets. The lower numerical
precision makes it possible to use the highly runtime efficient
LookupDecoderModel
, the smaller state size reduces the memory overhead of jump tables for random access, and the smaller word size may be advantageous on some embedded devices.- entropy coders with “small” presets:
SmallAnsCoder
,SmallRangeEncoder
,SmallRangeDecoder
, andSmallChainCoder
; - entropy models with “small” presets:
SmallContiguousLookupDecoderModel
,SmallNonContiguousLookupDecoderModel
,SmallContiguousCategoricalEntropyModel
,SmallNonContiguousCategoricalEncoderModel
,SmallNonContiguousCategoricalDecoderModel
, andSmallLeakyQuantizer
.
- entropy coders with “small” presets:
You’ll usually want to use matching presets for entropy coders and entropy models. However, it is legal to use entropy models with the “small” preset for an entropy coder with the (larger) “default” preset (the opposite way is statically prohibited on purpose by the type system). Using “small” models with a “default” coder may make sense if you want to mix entropy models with “default” and “small” presets (e.g., if some of the models are lookup tables but others aren’t).
Customizations for Advanced Use Cases
Some advanced use cases may not be covered by the above presets. For such cases, the entropy coders and entropy models can be adjusted type parameters listed below.
Warning: Adjusting these parameters can severly impact compression effectiveness if not done with care. We strongly recommend to always start from one of the above presets, to only deviate from them when necessary, and to measure the effect of any deviations from the presets (on compression effectiveness, computational performance, and memory consumption).
Type Parameters of Entropy Coders
Word
: aBitArray
specifying the smallest unit of compressed data that the entropy coder emits or reads in at a time; an entropy coder’sWord
type has to be at least as large as theProbability
type (see below) of any entropy model used with it.- The “default” preset sets
Word = u32
. - The “small” preset sets
Word = u16
.
- The “default” preset sets
State
: aBitArray
that parameterizes the size of the internal coder state (the actual coder stateCode::State
may differ fromState
, e.g., it may contain several fields of typeState
, as in the case of a Range Coder). Typically,State
has to be at least twice as large asWord
. You’ll want to use a smallState
if you’re planning to perform random accesses via theSeek
trait because your container format will then typically contain some kind of jump table composed of indices andCode::State
s (seePosSeek::Position
).- The “default” preset sets
State = u64
. - The “small” preset sets
State = u32
.
- The “default” preset sets
Backend
: the source and/or sink of compressed data. See modulebackends
. You’ll rarely have to specifyBackend
explictily, it usually defaults toVec<Word>
for encoding and to eitherVec<Word>
orCursor
for decoding, depending on which entropy coder you use and who owns the compressed data. TheChainCoder
has two type parameters for backends because it keeps track of compressed data and remainders separately.
Type Parameters of Entropy Models
Probability
: aBitArray
specifying the type used to represent probabilities (smaller than one) in fixed point arithmetic; must hold at leastPRECISION
bits (see below) and must not be larger than theWord
type of the entropy coder that uses the model.- The “default” preset sets
Probability = u32
(same as forWord
above). - The “small” preset sets
Probability = u16
(same as forWord
above).
- The “default” preset sets
PRECISION
: ausize
const generic that defines the number of bits used when representing probabilities in fixed-point arithmetic. Must not be zero or larger thanProbability::BITS
. A smallPRECISION
will lead to compression overhead due to poor approximations of the true probability distribution. A largePRECISION
will lead to a large memory overhead if you use aLookupDecoderModel
, and it can make decoding with aLeakilyQuantizedDistribution
slow.- The “default” preset sets
PRECISION = 24
. - The “small” preset sets
PRECISION = 12
.
- The “default” preset sets
Modules
- Experimental entropy coding algorithm for advanced variants of bitsback coding.
- Probability distributions that can be used as entropy models for stream codes.
- Near-optimal compression on a queue (“first in first out”)
- Fast and Near-optimal compression on a stack (“last in first out”)
Structs
- The iterator returned by
Decode::decode_iid_symbols
. - The iterator returned by
Decode::decode_symbols
. - The iterator returned by
Decode::try_decode_symbols
.
Enums
- The error type for
Encode::try_encode_symbols
andDecode::try_decode_symbols
.
Traits
- A trait for temporary conversion into a matching decoder type.
- Base trait for stream encoders and decoders
- A trait for stream decoders (i.e., decompressors)
- A trait for stream encoders (i.e., compressors)
- A trait for permanent conversion into a matching decoder type.