Module constriction::backends
source · Expand description
Sources and sinks of compressed data
This module declares traits for reading and writing sequences of bits in small chunks
(“words”) of fixed size. When entropy coders provided by constriction
read in or write
out a word of compressed data they do this through a generic type, typically called
Backend
, that implements one or more of the traits from this module. Declaring traits
for entropy coder backends allows us to make entropy coders generic over the specific
backends they use used rather than forcing a single choice of backend implementation on
all users of a given entropy coder. This gives users with specialized needs fine grained
control over the behaviour of entropy coders without sacrificing ergonomics or
computational efficiency in the common use cases.
§This module is meant for advanced use cases
Most users of constriction
don’t have to worry much about backends since all entropy
coders default (at compile time) to reasonable backends based on what you use them for.
You will usually end up using a Vec
backend for encoding, and you may use either a
Vec
or a Cursor
backend for decoding depending on whether your entropyc coder has
stack or queue semantics and whether you want to consume or retain the compressed data
after decoding. However, these automatically inferred default backends may not be
suitable for certain advanced use cases. For example, you may want to decode data
directly from a network socket, or you may be implementing a container format that
performs some additional operation like multiplexing directly after entropy coding. In
such a case, you may want to declare your own type for the source or sink of compressed
data and implement the relevant traits from this module for it so that you can use it as
a backend for an entropy coder.
§Module Overview
The main traits in this module are ReadWords
and WriteWords
. They express the
capability of a backend to be a source and/or sink of compressed data, respectively.
Both traits are generic over a type Word
, which is usually a BitArray
, and which
represents the smallest unit of compressed data that an entropy coder will read and/or
write at a time (all provided entropy coders in constriction
are generic over the
Word
and default to Word = u32
). Types that implement one of the backend traits
often also implement Pos
and/or Seek
from the parent module. The remaining
traits in this module specify further properties of the backend (see
BoundedReadWords
and BoundedWriteWords
) and provide permanent or temporary
conversions into backends with different capabilities (see IntoReadWords
,
IntoSeekReadWords
, AsReadWords
, and AsSeekReadWords
).
The backend traits are implemented for the standard library type Vec
where applicable
and for a few new types defined in this module. The most important type defined in this
module is a Cursor
, which wraps a generic buffer of an in-memory slice of words
together with an index into the buffer (constriction::backends::Cursor
is to the
backend traits roughly what std::io::Cursor
is to the std::io::{Read, Write}
traits). A Cursor
can hold either owned or borrowed data and can be used for reading
and/or writing (if the buffer is mutable) with both Queue
and Stack
read/write
semantics.
§Read/Write Semantics
The ReadWords
trait has a second type parameter with trait bound Semantics
.
Semantics
are (typically zero sized) marker types that indicate how reads from a
backend behave in relation to writes to the backend. This issue is moot for backends
that support only one of reading or writing but not both. Therefore, any backend that
does not implement WriteWords
may implement ReadWords
for multiple Semantics
.
However, backends that implement both ReadWords
and WriteWords
must make sure to
implement ReadWords
only for the one or more appropriate semantics. There are two
predefined Semantics
: Queue
, which indicates that reads and writes operate in the
same linear direction (“first in first out”) and Stack
, which is also a linear
sequence of words but one where reads and writes operate in opposite directions (“last
in first out”). You may define your own Semantics
if you want to implement a backend
based on a more fancy abstract data type.
For example, we implement WriteWords<Word>
and ReadWords<Word, Stack>
but not
ReadWords<Word, Queue>
for the type Vec<Word>
from the standard library because
Vec::push
and Vec::pop
have Stack
semantics. By contrast, the type Cursor
declared in this module, which wraps a memory buffer and an index into the buffer,
implements ReadWords
for both Stack
and Queue
semantics. The Queue
implementation increases the index after reading and the Stack
implementation
decreases the index before reading (if you want the opposite interpretation then you can
wrap the Cursor
in a Reverse
). Thus, a Cursor
can be used by entropy coders
with both stack and queue semantics, and both can use the Cursor
in the way that is
correct for them. By contrast, while a stack-based entropy coder (like AnsCoder
) can
use a Vec<Word>
for both encoding and decoding, an entropy coder with queue semantics
(like a Range Coder) can use a Vec
only for encoding but it has to wrap the Vec
in a
Cursor
for decoding, thus preventing accidental misuse.
§Example of Entropy Coding With a Non-Standard Backend
The following example encodes and decodes data to and from a file. It uses custom
backends that directly write each Word
to, and read each Word
from the file. This is
not a very practical example—if you encode all data at once then it’s simpler and
possibly even more efficient to use the default backend, which writes to an in-memory
buffer, call .get_compressed()
when you’re done, and then flush the buffer to the file
in one go. But custom backends similar to the ones used in this example could also be
used to add additional processing to the compressed data, such as multiplexing or
demultiplexing for some container format.
use constriction::{
backends::{FallibleCallbackWriteWords, FallibleIteratorReadWords},
stream::{
model::DefaultLeakyQuantizer,
queue::{DefaultRangeDecoder, DefaultRangeEncoder},
Decode, Encode,
},
};
use byteorder::{LittleEndian, ReadBytesExt, WriteBytesExt};
use probability::distribution::Gaussian;
use std::{fs::File, io::{BufReader, BufWriter}};
fn encode_to_file_on_the_fly(amt: u32) {
// Some simple entropy model, just for demonstration purpose.
let quantizer = DefaultLeakyQuantizer::new(-256..=255);
let model = quantizer.quantize(Gaussian::new(0.0, 100.0));
// Some long-ish sequence of test symbols, made up in a reproducible way.
let symbols = (0..amt).map(|i| {
let cheap_hash = i.wrapping_mul(0x6979_E2F3).wrapping_add(0x0059_0E91);
(cheap_hash >> (32 - 9)) as i32 - 256
});
// Open a file and build a backend that writes to this file one word at a time.
// (Wrapping the `File` in a `BufWriter` isn't strictly necessary here,
// it's just good practice when writing to a file.)
let mut file = BufWriter::new(File::create("backend_queue_example.tmp").unwrap());
let backend =
FallibleCallbackWriteWords::new(move |word| file.write_u32::<LittleEndian>(word));
// Wrap the backend in a `RangeEncoder` and encode (i.e., compress) the symbols.
let mut encoder = DefaultRangeEncoder::with_backend(backend);
encoder.encode_iid_symbols(symbols, &model).unwrap();
// Dropping the encoder doesn't automatically seal the compressed bit string because that
// could fail. We explicitly have to seal it by calling `.into_compressed()`, which returns
// the backend since that's what logically "holds" the compressed data, and then drop that.
std::mem::drop(encoder.into_compressed().unwrap());
}
fn decode_from_file_on_the_fly(amt: u32) {
// Same toy entropy model that we used for encoding.
let quantizer = DefaultLeakyQuantizer::new(-256..=255);
let model = quantizer.quantize(Gaussian::new(0.0, 100.0));
// Open the file and iterate over its contents in `u32` words (wrapping it in a `BufReader`
// is again just for good practice). We're deliberately being pedantic about the errors
// here in order to show how backend errors can be reported to the encoder.
let mut file = BufReader::new(File::open("backend_queue_example.tmp").unwrap());
let word_iterator = std::iter::from_fn(move || match file.read_u32::<LittleEndian>() {
Ok(word) => Some(Ok(word)),
Err(err) => {
if err.kind() == std::io::ErrorKind::UnexpectedEof {
None // Reached end of file, end iteration.
} else {
Some(Err(err)) // Some other I/O error occurred. Propagate it up.
}
}
});
// Create a decoder that decodes on the fly from our iterator.
let backend = FallibleIteratorReadWords::new(word_iterator);
let mut decoder = DefaultRangeDecoder::with_backend(backend).unwrap();
// Decode the symbols and verify their correctness.
for (i, symbol) in decoder.decode_iid_symbols(amt as usize, &model).enumerate() {
let cheap_hash = (i as u32).wrapping_mul(0x6979_E2F3).wrapping_add(0x0059_0E91);
let expected = (cheap_hash >> (32 - 9)) as i32 - 256;
assert_eq!(symbol.unwrap(), expected);
}
// Recover the original iterator over compressed words and verify that it's been exhausted.
let mut word_iterator = decoder.into_raw_parts().0.into_iter();
assert!(word_iterator.next().is_none());
// `word_iterator` owns the file since we used a `move` clausure above to construct it.
// So dropping it calls `std::fs::File`'s destructor, which releases the file handle.
std::mem::drop(word_iterator);
std::fs::remove_file("backend_queue_example.tmp").unwrap();
}
encode_to_file_on_the_fly(1000);
decode_from_file_on_the_fly(1000);
Structs§
- Adapter that turns an in-memory buffer into an
impl ReadWords
and/or animpl WriteWords
. - Adapter that turns a fallible callback into a fallible data sink.
- Adapter that turns an iterator over
Result<Word, ReadError>
into a data source. - Adapter that turns an infallible callback into an infallible data sink.
- Adapter that turns an iterator over
Word
into a data source. - Wrapper that inverts the read/write directions of a data source and/or data sink.
Enums§
- Error type for data sinks with a finite capacity.
Traits§
- A trait for types that can be temporarily used by decoders as a source of compressed data.
- A trait for types that can be temporarily used by random-access decoders as a source of compressed data.
- A trait for data sources that know how much data is left.
- A trait for data sinks with a known finite capacity.
- A trait for types that can be turned into a source of compressed data (for decoders).
- A trait for types that can be turned into a randomly accessible source of compressed data (for decoders).
- A trait for sources of compressed data (mainly used by decoders).
- Unsafe marker trait indicating sane implementation of
AsRef
(and possiblyAsMut
). - A trait for sinks of compressed data (mainly used by encoders).