Expand description
Binary data encoder and decoder (“codec”).
This codec is meant to be:
- Accessible, so that it doesn’t require specialized knowledge (beyond foundational coding skills) to implement on any platform.
- Streamable, so that it can encode into and decode from binary data streams which only support sequential reads and writes (i.e., no “backtracking” or allocations).
- Upgradeable, so that the format of encoded data can evolve without breaking systems that rely on outdated decoders.
Aspects of this codec were inspired by Simple Binary Encoding and Cap’n Proto.
§Bits, Bytes, Endians, and Alignments
Despite the widespread use of the word byte, there isn’t a universal standard for what a byte is.
For the sake of clarity, in our documentation,
a byte is 8
bits of data. Some people call
this precise definition an octet; we prefer
“byte” because “octet” is a bit niche.
§Endianness
When bytes are transmitted between systems, those systems might not process the bytes in the same exact order.
This difference is due to endianness, which is the order a computer reads the bytes representing a number. For example, consider the following three bytes:
01101000 01101001 00100001
A “little-endian” computer will read these
bytes left-to-right, interpreting them as
the number 6,842,657
.
A “big-endian” computer will read these
bytes right-to-left, decoding them as the
number 1,480,324
.
This codec encodes numbers in little-endian format.
§Alignment
When computers are asked to read a byte,
they usually don’t read a single byte;
they read a batch of bytes called a word.
Words’ sizes vary between computers, but
they’re usually 4
or 8
bytes long.
Because computers process bytes in words, we can improve the performance of our code by aligning our data to be a size that is evenly divisible by a word.
Note: The most common alignment strategy is to re-order the components of a data structure from largest to smallest, inserting padding after each component so that the data structure (up to the end of the padding) has a size that is evenly divisible by a word.
Data encoded by this codec is unaligned, with no padding bytes within or around data. By not aligning data, the codec sacrifices some performance in exchange for a smaller encoded size, and a vasly simplified codec.
However, this codec does accomodate aligned data:
All encoded metadata is aligned to an
8
byte word boundary., meaning every encoded
data is guaranteed to start on an 8
-byte boundary so
long as the blob section of any Format::Data
is 8
-byte aligned.
§The Encoding
This codec encodes data as a structured sequence of bytes containing, in order:
- A
DataHeader
describing the format of the encoded data sequence, and the number of data encoded in the sequence. - For each encoded data following the header:
- The data’s blob fields, encoded in some predetermined documented order.
- The data’s data fields, each preceded by their
own
DataHeader
, and encoded in some predetermined documented order.
Each DataHeader
contains:
Type | Description |
---|---|
u16 | The number of data following the header; 0 for no data, 1 for one data, and so on. |
u16 | The ordinal of the data’s type in it’s documentation, defaulting to 0 (“unspecified”). |
u16 | The total size in bytes of the data’s Format::Blob fields, defaulting to 0 (none). |
u16 | The total number of the data’s Format::Data fields, defaulting to 0 (none). |
Because each DataHeader
contains a count
of how many distinct sequences of data follow
the header, the encoding is identical for an
empty sequence of data, a single sequence of
data, and a list of sequences of data.
Data is not encoded with any additional metadata
(e.g., field or type names). The DataHeader
provides enough information to traverse any data,
but the data’s contents won’t be useful without
having the data’s corresponding documentation.
Structs§
- Data
Format - Contents of a
Format::Data
. - Data
Header - Header preceding a sequence of zero or more
data encoded with the same
DataFormat
.
Enums§
- Codec
Error - Enumeration of errors that may occur while encoding or decoding data.
- Format
- The low-level encoding format of some data.
Constants§
- TEMP_
BUFFER_ SIZE - Default size used for temporary, stack-allocated buffers.
Traits§
- Decodable
- A thing that decodes from
codec
-compliant data. - Encodable
- A thing that encodes into
codec
-compliant data. - Reads
Decodable - A thing that
Reads
Decodable
data. - Writes
Encodable - A thing that
Writes
Encodable
data.
Type Aliases§
- Format
Metadata - Numeric type used for describing a
Format
.