Skip to main content

axsys_noun/
serdes.rs

1//! [`Noun`](crate::Noun) serialization and deserialization.
2//!
3//! # Serialization
4//!
5//! [Jam] is a bitwise encoding of a noun. There are three different entities that may appear in
6//! the encoding: atoms, cells, and backreferences. Each entity is identified by a unique sequence
7//! of one or two tag bits: an atom's tag is `0b0`, a cell's tag is `0b01`, and a backreference's
8//! tag is `0b11`. Encoding begins with the noun, which must be either an atom or a cell.
9//!
10//! If the noun is an atom, the tag `0b0` is written to the bitstream, followed by the encoded
11//! length of the atom. The length is encoded by writing `N` low bits to the bitstream, where `N`
12//! is the number of bits required to represent the length, followed by a single high bit, followed
13//! by the bits of the length (from least significant to most significant) *with the most
14//! significant high bit of the length omitted*. Then, the bits of the atom itself (from least
15//! significant to most significant) are written to the bitstream.
16//!
17//! The atom `19`, for example, serializes ("jams") to `2480`, or `0b100110110000`, which breaks
18//! into the tag bits, length bits, and atom bits as follows:
19//! ```text
20//!  10011      011000      0
21//! |_____|    |______|    |_|
22//!  atom       length     tag
23//! ```
24//!
25//! If the noun is a cell, the tag `0b01` is written to the bitstream, followed by the encoded
26//! head and then the encoded tail.
27//!
28//! The cell `[0 19]`, for example, serializes ("jams") to `39_689`, or `0b1001101100001001`, which
29//! breaks into the tag bits, head bits, and tail bits as follows:
30//! ```text
31//!  10011         011000         0         1         0         01
32//! |_____|       |______|       |_|       |_|       |_|       |__|
33//!  tail        tail length  tail tag   head len  head tag  cell tag
34//! ```
35//! **Note**: The atom `0` serializes to `0b1` because it has length zero.
36//!
37//!
38//! The above description of how atoms and cells are encoded ignored backreferences. A
39//! backreference is encoded into the bitstream when a noun that has already been encoded in the
40//! bitstream appears again during encoding. A backreference is simply an index into a prior part
41//! of the bitstream at which the encoding of the first occurence of the noun in question was
42//! encoded. A backreference is encoded just like an atom, except the atom tag `0b0` is replaced
43//! with the backreference tag `0b11`. However, if the noun in question is an atom and the encoded
44//! atom requires fewer bits than the corresponding backreference, then the atom is encoded into the
45//! bitstream rather than the backreference.
46//!
47//! The cell `[1 1]`, for example, does not have any backreferences in its encoding because `1`
48//! requires fewer bits to encode than the backreference that would replace the second occurence of
49//! `1` in the bitstream. The cell `[10_000 10_000]`, which does have a backreference in its
50//! encoding, serializes ("jams") to `4_952_983_169`, or `0b100100111001110001000011010000001`,
51//! which breaks down as follows (notice how `tail` is a backreference, which decodes into the
52//! index `2`, which is the start of the encoding of the head):
53//! ```text
54//!   +---------------------------------------------------------------------> idx 2
55//!   |                                                                         |
56//!  10         0100         11         10011100010000         11010000         0         01
57//! |__|       |____|       |__|       |______________|       |________|       |_|       |__|
58//! tail      tail length  tail tag          head             head length    head tag   cell tag
59//! ```
60//!
61//! # Deserialization
62//! [Cue] is a bitwise decoding of a jammed noun. It's simply the inverse of the jam encoding
63//! described above.
64//!
65//! [Jam]: https://developers.urbit.org/reference/hoon/stdlib/2p#jam
66//! [Cue]: https://developers.urbit.org/reference/hoon/stdlib/2p#cue
67
68use crate::{atom::Atom, marker::Nounish};
69use std::{
70    fmt::{self, Display, Formatter},
71    result,
72};
73
74/// Errors that occur when serializing/deserializing.
75#[derive(Debug)]
76pub enum Error {
77    /// Building up an atom with [`atom::Builder`](crate::atom::Builder) failed.
78    AtomBuilding,
79    /// A key lookup in the cache failed.
80    CacheMiss,
81    /// A corrupt backreference was encountered.
82    InvalidBackref,
83    /// A corrupt length encoding was encountered.
84    InvalidLen,
85    /// A corrupt tag was encountered.
86    InvalidTag,
87}
88
89impl Display for Error {
90    fn fmt(&self, f: &mut Formatter<'_>) -> result::Result<(), fmt::Error> {
91        match self {
92            Self::AtomBuilding => write!(f, "building an atom a bit at a time failed"),
93            Self::CacheMiss => write!(
94                f,
95                "a key that was expected to be in the cache was missing from the cache"
96            ),
97            Self::InvalidBackref => write!(f, "encountered an invalid backreference"),
98            Self::InvalidLen => write!(f, "encountered an invalid length"),
99            Self::InvalidTag => write!(f, "encountered an invalid tag"),
100        }
101    }
102}
103
104/// A specialized [`Result`] type for serialization/deserialization operations that return
105/// [`serdes::Error`] on error.
106///
107/// [`serdes::Error`]: [`Error`]
108pub type Result<T> = std::result::Result<T, Error>;
109
110/// Serialize a noun type into a bitstream.
111#[doc(alias("serialize", "serialization"))]
112pub trait Jam: Nounish {
113    /// Serializes ("jams") a noun, returning the resulting bitstream as an atom.
114    #[doc(alias("serialize", "serialization"))]
115    fn jam(self) -> Atom;
116}
117
118/// Deserialize a bitstream into a noun type.
119#[doc(alias("deserialize", "deserialization"))]
120pub trait Cue: Nounish + Sized {
121    /// Deserializes ("cues") a jammed noun (a bitstream represented as an atom), returning the
122    /// resulting noun type.
123    #[doc(alias("deserialize", "deserialization"))]
124    fn cue(jammed_noun: Atom) -> Result<Self>;
125}