Module codec

Source
Expand description

Binary data encoder and decoder (“codec”).

This codec is meant to be:

  1. Accessible, so that it doesn’t require specialized knowledge (beyond foundational coding skills) to implement on any platform.
  2. Streamable, so that it can encode into and decode from binary data streams which only support sequential reads and writes (i.e., no “backtracking” or allocations).
  3. Upgradeable, so that the format of encoded data can evolve without breaking systems that rely on outdated decoders.

Aspects of this codec were inspired by Simple Binary Encoding and Cap’n Proto.

§Bits, Bytes, Endians, and Alignments

Despite the widespread use of the word byte, there isn’t a universal standard for what a byte is.

For the sake of clarity, in our documentation, a byte is 8 bits of data. Some people call this precise definition an octet; we prefer “byte” because “octet” is a bit niche.

§Endianness

When bytes are transmitted between systems, those systems might not process the bytes in the same exact order.

This difference is due to endianness, which is the order a computer reads the bytes representing a number. For example, consider the following three bytes:

01101000 01101001 00100001

A “little-endian” computer will read these bytes left-to-right, interpreting them as the number 6,842,657.

A “big-endian” computer will read these bytes right-to-left, decoding them as the number 1,480,324.

This codec encodes numbers in little-endian format.

§Alignment

When computers are asked to read a byte, they usually don’t read a single byte; they read a batch of bytes called a word. Words’ sizes vary between computers, but they’re usually 4 or 8 bytes long.

Because computers process bytes in words, we can improve the performance of our code by aligning our data to be a size that is evenly divisible by a word.

Note: The most common alignment strategy is to re-order the components of a data structure from largest to smallest, inserting padding after each component so that the data structure (up to the end of the padding) has a size that is evenly divisible by a word.

Data encoded by this codec is unaligned, with no padding bytes within or around data. By not aligning data, the codec sacrifices some performance in exchange for a smaller encoded size, and a vasly simplified codec.

However, this codec does accomodate aligned data: All encoded metadata is aligned to an 8 byte word boundary., meaning every encoded data is guaranteed to start on an 8-byte boundary so long as the blob section of any Format::Data is 8-byte aligned.

§The Encoding

This codec encodes data as a structured sequence of bytes containing, in order:

  1. A DataHeader describing the format of the encoded data sequence, and the number of data encoded in the sequence.
  2. For each encoded data following the header:
    1. The data’s blob fields, encoded in some predetermined documented order.
    2. The data’s data fields, each preceded by their own DataHeader, and encoded in some predetermined documented order.

Each DataHeader contains:

TypeDescription
u16The number of data following the header; 0 for no data, 1 for one data, and so on.
u16The ordinal of the data’s type in it’s documentation, defaulting to 0 (“unspecified”).
u16The total size in bytes of the data’s Format::Blob fields, defaulting to 0 (none).
u16The total number of the data’s Format::Data fields, defaulting to 0 (none).

Because each DataHeader contains a count of how many distinct sequences of data follow the header, the encoding is identical for an empty sequence of data, a single sequence of data, and a list of sequences of data.

Data is not encoded with any additional metadata (e.g., field or type names). The DataHeader provides enough information to traverse any data, but the data’s contents won’t be useful without having the data’s corresponding documentation.

Structs§

DataFormat
Contents of a Format::Data.
DataHeader
Header preceding a sequence of zero or more data encoded with the same DataFormat.

Enums§

CodecError
Enumeration of errors that may occur while encoding or decoding data.
Format
The low-level encoding format of some data.

Constants§

TEMP_BUFFER_SIZE
Default size used for temporary, stack-allocated buffers.

Traits§

Decodable
A thing that decodes from codec-compliant data.
Encodable
A thing that encodes into codec-compliant data.
ReadsDecodable
A thing that Reads Decodable data.
WritesEncodable
A thing that Writes Encodable data.

Type Aliases§

FormatMetadata
Numeric type used for describing a Format.