axsys_noun/serdes.rs
1//! [`Noun`](crate::Noun) serialization and deserialization.
2//!
3//! # Serialization
4//!
5//! [Jam] is a bitwise encoding of a noun. There are three different entities that may appear in
6//! the encoding: atoms, cells, and backreferences. Each entity is identified by a unique sequence
7//! of one or two tag bits: an atom's tag is `0b0`, a cell's tag is `0b01`, and a backreference's
8//! tag is `0b11`. Encoding begins with the noun, which must be either an atom or a cell.
9//!
10//! If the noun is an atom, the tag `0b0` is written to the bitstream, followed by the encoded
11//! length of the atom. The length is encoded by writing `N` low bits to the bitstream, where `N`
12//! is the number of bits required to represent the length, followed by a single high bit, followed
13//! by the bits of the length (from least significant to most significant) *with the most
14//! significant high bit of the length omitted*. Then, the bits of the atom itself (from least
15//! significant to most significant) are written to the bitstream.
16//!
17//! The atom `19`, for example, serializes ("jams") to `2480`, or `0b100110110000`, which breaks
18//! into the tag bits, length bits, and atom bits as follows:
19//! ```text
20//! 10011 011000 0
21//! |_____| |______| |_|
22//! atom length tag
23//! ```
24//!
25//! If the noun is a cell, the tag `0b01` is written to the bitstream, followed by the encoded
26//! head and then the encoded tail.
27//!
28//! The cell `[0 19]`, for example, serializes ("jams") to `39_689`, or `0b1001101100001001`, which
29//! breaks into the tag bits, head bits, and tail bits as follows:
30//! ```text
31//! 10011 011000 0 1 0 01
32//! |_____| |______| |_| |_| |_| |__|
33//! tail tail length tail tag head len head tag cell tag
34//! ```
35//! **Note**: The atom `0` serializes to `0b1` because it has length zero.
36//!
37//!
38//! The above description of how atoms and cells are encoded ignored backreferences. A
39//! backreference is encoded into the bitstream when a noun that has already been encoded in the
40//! bitstream appears again during encoding. A backreference is simply an index into a prior part
41//! of the bitstream at which the encoding of the first occurence of the noun in question was
42//! encoded. A backreference is encoded just like an atom, except the atom tag `0b0` is replaced
43//! with the backreference tag `0b11`. However, if the noun in question is an atom and the encoded
44//! atom requires fewer bits than the corresponding backreference, then the atom is encoded into the
45//! bitstream rather than the backreference.
46//!
47//! The cell `[1 1]`, for example, does not have any backreferences in its encoding because `1`
48//! requires fewer bits to encode than the backreference that would replace the second occurence of
49//! `1` in the bitstream. The cell `[10_000 10_000]`, which does have a backreference in its
50//! encoding, serializes ("jams") to `4_952_983_169`, or `0b100100111001110001000011010000001`,
51//! which breaks down as follows (notice how `tail` is a backreference, which decodes into the
52//! index `2`, which is the start of the encoding of the head):
53//! ```text
54//! +---------------------------------------------------------------------> idx 2
55//! | |
56//! 10 0100 11 10011100010000 11010000 0 01
57//! |__| |____| |__| |______________| |________| |_| |__|
58//! tail tail length tail tag head head length head tag cell tag
59//! ```
60//!
61//! # Deserialization
62//! [Cue] is a bitwise decoding of a jammed noun. It's simply the inverse of the jam encoding
63//! described above.
64//!
65//! [Jam]: https://developers.urbit.org/reference/hoon/stdlib/2p#jam
66//! [Cue]: https://developers.urbit.org/reference/hoon/stdlib/2p#cue
67
68use crate::{atom::Atom, marker::Nounish};
69use std::{
70 fmt::{self, Display, Formatter},
71 result,
72};
73
74/// Errors that occur when serializing/deserializing.
75#[derive(Debug)]
76pub enum Error {
77 /// Building up an atom with [`atom::Builder`](crate::atom::Builder) failed.
78 AtomBuilding,
79 /// A key lookup in the cache failed.
80 CacheMiss,
81 /// A corrupt backreference was encountered.
82 InvalidBackref,
83 /// A corrupt length encoding was encountered.
84 InvalidLen,
85 /// A corrupt tag was encountered.
86 InvalidTag,
87}
88
89impl Display for Error {
90 fn fmt(&self, f: &mut Formatter<'_>) -> result::Result<(), fmt::Error> {
91 match self {
92 Self::AtomBuilding => write!(f, "building an atom a bit at a time failed"),
93 Self::CacheMiss => write!(
94 f,
95 "a key that was expected to be in the cache was missing from the cache"
96 ),
97 Self::InvalidBackref => write!(f, "encountered an invalid backreference"),
98 Self::InvalidLen => write!(f, "encountered an invalid length"),
99 Self::InvalidTag => write!(f, "encountered an invalid tag"),
100 }
101 }
102}
103
104/// A specialized [`Result`] type for serialization/deserialization operations that return
105/// [`serdes::Error`] on error.
106///
107/// [`serdes::Error`]: [`Error`]
108pub type Result<T> = std::result::Result<T, Error>;
109
110/// Serialize a noun type into a bitstream.
111#[doc(alias("serialize", "serialization"))]
112pub trait Jam: Nounish {
113 /// Serializes ("jams") a noun, returning the resulting bitstream as an atom.
114 #[doc(alias("serialize", "serialization"))]
115 fn jam(self) -> Atom;
116}
117
118/// Deserialize a bitstream into a noun type.
119#[doc(alias("deserialize", "deserialization"))]
120pub trait Cue: Nounish + Sized {
121 /// Deserializes ("cues") a jammed noun (a bitstream represented as an atom), returning the
122 /// resulting noun type.
123 #[doc(alias("deserialize", "deserialization"))]
124 fn cue(jammed_noun: Atom) -> Result<Self>;
125}