Module constriction::backends

source ·
Expand description

Sources and sinks of compressed data

This module declares traits for reading and writing sequences of bits in small chunks (“words”) of fixed size. When entropy coders provided by constriction read in or write out a word of compressed data they do this through a generic type, typically called Backend, that implements one or more of the traits from this module. Declaring traits for entropy coder backends allows us to make entropy coders generic over the specific backends they use used rather than forcing a single choice of backend implementation on all users of a given entropy coder. This gives users with specialized needs fine grained control over the behaviour of entropy coders without sacrificing ergonomics or computational efficiency in the common use cases.

This module is meant for advanced use cases

Most users of constriction don’t have to worry much about backends since all entropy coders default (at compile time) to reasonable backends based on what you use them for. You will usually end up using a Vec backend for encoding, and you may use either a Vec or a Cursor backend for decoding depending on whether your entropyc coder has stack or queue semantics and whether you want to consume or retain the compressed data after decoding. However, these automatically inferred default backends may not be suitable for certain advanced use cases. For example, you may want to decode data directly from a network socket, or you may be implementing a container format that performs some additional operation like multiplexing directly after entropy coding. In such a case, you may want to declare your own type for the source or sink of compressed data and implement the relevant traits from this module for it so that you can use it as a backend for an entropy coder.

Module Overview

The main traits in this module are ReadWords and WriteWords. They express the capability of a backend to be a source and/or sink of compressed data, respectively. Both traits are generic over a type Word, which is usually a BitArray, and which represents the smallest unit of compressed data that an entropy coder will read and/or write at a time (all provided entropy coders in constriction are generic over the Word and default to Word = u32). Types that implement one of the backend traits often also implement Pos and/or Seek from the parent module. The remaining traits in this module specify further properties of the backend (see BoundedReadWords and BoundedWriteWords) and provide permanent or temporary conversions into backends with different capabilities (see IntoReadWords, IntoSeekReadWords, AsReadWords, and AsSeekReadWords).

The backend traits are implemented for the standard library type Vec where applicable and for a few new types defined in this module. The most important type defined in this module is a Cursor, which wraps a generic buffer of an in-memory slice of words together with an index into the buffer (constriction::backends::Cursor is to the backend traits roughly what std::io::Cursor is to the std::io::{Read, Write} traits). A Cursor can hold either owned or borrowed data and can be used for reading and/or writing (if the buffer is mutable) with both Queue and Stack read/write semantics.

Read/Write Semantics

The ReadWords trait has a second type parameter with trait bound Semantics. Semantics are (typically zero sized) marker types that indicate how reads from a backend behave in relation to writes to the backend. This issue is moot for backends that support only one of reading or writing but not both. Therefore, any backend that does not implement WriteWords may implement ReadWords for multiple Semantics. However, backends that implement both ReadWords and WriteWords must make sure to implement ReadWords only for the one or more appropriate semantics. There are two predefined Semantics: Queue, which indicates that reads and writes operate in the same linear direction (“first in first out”) and Stack, which is also a linear sequence of words but one where reads and writes operate in opposite directions (“last in first out”). You may define your own Semantics if you want to implement a backend based on a more fancy abstract data type.

For example, we implement WriteWords<Word> and ReadWords<Word, Stack> but not ReadWords<Word, Queue> for the type Vec<Word> from the standard library because Vec::push and Vec::pop have Stack semantics. By contrast, the type Cursor declared in this module, which wraps a memory buffer and an index into the buffer, implements ReadWords for both Stack and Queue semantics. The Queue implementation increases the index after reading and the Stack implementation decreases the index before reading (if you want the opposite interpretation then you can wrap the Cursor in a Reverse). Thus, a Cursor can be used by entropy coders with both stack and queue semantics, and both can use the Cursor in the way that is correct for them. By contrast, while a stack-based entropy coder (like AnsCoder) can use a Vec<Word> for both encoding and decoding, an entropy coder with queue semantics (like a Range Coder) can use a Vec only for encoding but it has to wrap the Vec in a Cursor for decoding, thus preventing accidental misuse.

Example of Entropy Coding With a Non-Standard Backend

The following example encodes and decodes data to and from a file. It uses custom backends that directly write each Word to, and read each Word from the file. This is not a very practical example—if you encode all data at once then it’s simpler and possibly even more efficient to use the default backend, which writes to an in-memory buffer, call .get_compressed() when you’re done, and then flush the buffer to the file in one go. But custom backends similar to the ones used in this example could also be used to add additional processing to the compressed data, such as multiplexing or demultiplexing for some container format.

use constriction::{
    backends::{FallibleCallbackWriteWords, FallibleIteratorReadWords},
    stream::{
        model::DefaultLeakyQuantizer,
        queue::{DefaultRangeDecoder, DefaultRangeEncoder},
        Decode, Encode,
    },
};
use byteorder::{LittleEndian, ReadBytesExt, WriteBytesExt};
use probability::distribution::Gaussian;
use std::{fs::File, io::{BufReader, BufWriter}};

fn encode_to_file_on_the_fly(amt: u32) {
    // Some simple entropy model, just for demonstration purpose.
    let quantizer = DefaultLeakyQuantizer::new(-256..=255);
    let model = quantizer.quantize(Gaussian::new(0.0, 100.0));

    // Some long-ish sequence of test symbols, made up in a reproducible way.
    let symbols = (0..amt).map(|i| {
        let cheap_hash = i.wrapping_mul(0x6979_E2F3).wrapping_add(0x0059_0E91);
        (cheap_hash >> (32 - 9)) as i32 - 256
    });

    // Open a file and build a backend that writes to this file one word at a time.
    // (Wrapping the `File` in a `BufWriter` isn't strictly necessary here,
    // it's just good practice when writing to a file.)
    let mut file = BufWriter::new(File::create("backend_queue_example.tmp").unwrap());
    let backend =
        FallibleCallbackWriteWords::new(move |word| file.write_u32::<LittleEndian>(word));

    // Wrap the backend in a `RangeEncoder` and encode (i.e., compress) the symbols.
    let mut encoder = DefaultRangeEncoder::with_backend(backend);
    encoder.encode_iid_symbols(symbols, &model).unwrap();

    // Dropping the encoder doesn't automatically seal the compressed bit string because that
    // could fail. We explicitly have to seal it by calling `.into_compressed()`, which returns
    // the backend since that's what logically "holds" the compressed data, and then drop that.
    std::mem::drop(encoder.into_compressed().unwrap());
}

fn decode_from_file_on_the_fly(amt: u32) {
    // Same toy entropy model that we used for encoding.
    let quantizer = DefaultLeakyQuantizer::new(-256..=255);
    let model = quantizer.quantize(Gaussian::new(0.0, 100.0));

    // Open the file and iterate over its contents in `u32` words (wrapping it in a `BufReader`
    // is again just for good practice). We're deliberately being pedantic about the errors
    // here in order to show how backend errors can be reported to the encoder.
    let mut file = BufReader::new(File::open("backend_queue_example.tmp").unwrap());
    let word_iterator = std::iter::from_fn(move || match file.read_u32::<LittleEndian>() {
        Ok(word) => Some(Ok(word)),
        Err(err) => {
            if err.kind() == std::io::ErrorKind::UnexpectedEof {
                None // Reached end of file, end iteration.
            } else {
                Some(Err(err)) // Some other I/O error occurred. Propagate it up.
            }
        }
    });

    // Create a decoder that decodes on the fly from our iterator.
    let backend = FallibleIteratorReadWords::new(word_iterator);
    let mut decoder = DefaultRangeDecoder::with_backend(backend).unwrap();

    // Decode the symbols and verify their correctness.
    for (i, symbol) in decoder.decode_iid_symbols(amt as usize, &model).enumerate() {
        let cheap_hash = (i as u32).wrapping_mul(0x6979_E2F3).wrapping_add(0x0059_0E91);
        let expected = (cheap_hash >> (32 - 9)) as i32 - 256;
        assert_eq!(symbol.unwrap(), expected);
    }

    // Recover the original iterator over compressed words and verify that it's been exhausted.
    let mut word_iterator = decoder.into_raw_parts().0.into_iter();
    assert!(word_iterator.next().is_none());

    // `word_iterator` owns the file since we used a `move` clausure above to construct it.
    // So dropping it calls `std::fs::File`'s destructor, which releases the file handle.
    std::mem::drop(word_iterator);
    std::fs::remove_file("backend_queue_example.tmp").unwrap();
}

encode_to_file_on_the_fly(1000);
decode_from_file_on_the_fly(1000);

Structs

Enums

Traits

  • A trait for types that can be temporarily used by decoders as a source of compressed data.
  • A trait for types that can be temporarily used by random-access decoders as a source of compressed data.
  • A trait for data sources that know how much data is left.
  • A trait for data sinks with a known finite capacity.
  • A trait for types that can be turned into a source of compressed data (for decoders).
  • A trait for types that can be turned into a randomly accessible source of compressed data (for decoders).
  • A trait for sources of compressed data (mainly used by decoders).
  • Unsafe marker trait indicating sane implementation of AsRef (and possibly AsMut).
  • A trait for sinks of compressed data (mainly used by encoders).