Expand description
§Unambiguous encoding
The core of the crate is functionality to unambiguously encode any structured data into bytes. It’s then used to digest any data into a hash. Note that this module provides low-level implementation details which you normally don’t need to know unless you manually implement Digestable trait.
Any structured value
is encoded either as a bytestring or as a list (each element
within the list is either a bytestring or a list). The simplified grammar can be seen as
(the full grammar is presented and explained below):
value ::= leaf | list
leaf ::= bytestring
list ::= [value]
Encoding goal is to distinguish ["12", "3"]
from ["1", "23"]
, ["1", [], "2"]
from ["1", "2"]
and so on. Now, we only need to map any structured data onto
that grammar to have an unambiguous encoding. Below, we will show how Rust structures
can be mapped onto the lists, and then we describe how exactly encoding works.
§Mapping Rust types onto lists
§Structure
Structures can be encoded as a list of field names followed by field values. For instance, a structure
struct Person {
name: String,
job_title: String,
}
let alice = Person {
name: "Alice".into(),
job_title: "cryptographer".into(),
};
will be encoded as a list below:
["name", "Alice", "job_title", "cryptographer"]
EncodeStruct can be used to encode a struct.
§Enum
Enums are encoded as a list that contains a variant name, and then, same as for a structure, it contains fields names followed by field values associated with the variant (if any). For instance, an enum
enum Shape {
Circle { radius: u8 },
Square { width: u8 },
}
let circle = Shape::Circle { radius: 5 };
will be encoded as a list:
["variant", "Circle", "radius", 5_u32]
EncodeEnum can be used to encode an enum.
§Primitive types
Primitive values can be encoded as bytestrings as long as they can be unambiguously converted to bytes.
For instance, strings are trivially converted to bytes via str::as_bytes
.
Unsigned integers are encoded in big-endian representation truncated from leading zeroes. It allows
us to unambiguously encode usize
on different machines even if they have different size of machine
word.
Signed integers are encoded as concatenation of their sign and byte representation of their absolute value.
§Domain separation
When value is encoded into bytes, it loses its type. For instance, “abcd” bytestring may correspond to
Vec<u8>
, String
, u32
and so on. When it’s required to distinguish one type from another, domain
separation tag can be used.
Domain separation tag can be specified for any value using .with_tag()
method.
It’s recommended to specify the tag only for high-level structures.
§Encoding lists into bytes
Generally, when a value is encoded, firstly we write its byte representation and then append its metadata like length and type (leaf of list). Writing a length after the value makes it possible to encode byte strings and lists the length of which is not known in advance.
Any value
is encoded according to this grammar specification:
value ::= leaf | leaf_ctx | list | list_ctx
leaf ::= bytestring len(bytestring) LEAF
leaf_ctx ::= bytestring len(bytestring) tag len(tag) LEAF_CTX
list ::= [value] len([value]) LIST
list_ctx ::= [value] len([value]) ctx len(ctx) LIST_CTX
len(n) ::=
if n.len() <= u32::MAX {
(n.len() as u32) LEN_32
} else {
let len_n = n.len().to_be_bytes().strip();
assert!(len_n.len() < 256);
len_n (len_n.len() as u8) BIGLEN
}
LIST ::= 1
LIST_CTX ::= 2
LEAF ::= 3
LEAF_CTX ::= 4
LEN_32 ::= 5
BIGLEN ::= 6
§Example
A structured data below
struct Person {
name: String,
skills: Vec<String>,
job_title: String,
}
let alice = Person {
name: "Alice".into(),
skills: vec!["math".into(), "crypto".into()],
job_title: "cryptographer".into(),
};
will be encoded as a list:
["name", "Alice", "skills", ["math", "crypto"], "job_title", "cryptographer"]
which will be translated into the bytes:
// Writes "name", "Alice" onto the stack
"name" 4_u32 LEN_32 LEAF
"Alice" 5_u32 LEN_32 LEAF
// Writes her skills onto the stack
"skills" 6_u32 LEN_32 LEAF
"math" 4_u32 LEN_32 LEAF
"crypto" 6_u32 LEN_32 LEAF
// Merges the last 2 elements from the stack into a list
2_u32 LEN_32 LIST
// Writes her job title
"job_title" 9_u32 LEN_32 LEAF
"cryptographer" 13_u32 LEN_32 LEAF
// Merges the last 6 elements from the stack into a list
6_u32 LEN_32 LIST
where LEAF
, LIST
, and LEN_32
are constants defined above.
Structs§
- Buffer
Digest digest
- Wraps
digest::Digest
and implementsBuffer
- Buffer
Update digest
- Wraps
digest::Update
and implementsBuffer
- Encode
Enum - Encodes an enum
- Encode
Leaf - Encodes a leaf (bytestring)
- Encode
List - Encodes a list of values
- Encode
Struct - Encodes a structure
- Encode
Value - Encodes a value
Constants§
- BIGLEN
- Control symbol
- LEAF
- Control symbol
- LEAF_
CTX - Control symbol
- LEN_32
- Control symbol
- LIST
- Control symbol
- LIST_
CTX - Control symbol
Traits§
- Buffer
- A buffer that exposes append-only access
Functions§
- encode_
len - Encodes length of list or leaf