Module encoding

Module encoding 

Source
Expand description

§Unambiguous encoding

The core of the crate is functionality to unambiguously encode any structured data into bytes. It’s then used to digest any data into a hash. Note that this module provides low-level implementation details which you normally don’t need to know unless you manually implement Digestable trait.

Any structured value is encoded either as a bytestring or as a list (each element within the list is either a bytestring or a list). The simplified grammar can be seen as (the full grammar is presented and explained below):

value ::= leaf | list
leaf  ::= bytestring
list  ::= [value]

Encoding goal is to distinguish ["12", "3"] from ["1", "23"], ["1", [], "2"] from ["1", "2"] and so on. Now, we only need to map any structured data onto that grammar to have an unambiguous encoding. Below, we will show how Rust structures can be mapped onto the lists, and then we describe how exactly encoding works.

§Mapping Rust types onto lists

§Structure

Structures can be encoded as a list of field names followed by field values. For instance, a structure

struct Person {
    name: String,
    job_title: String,
}
let alice = Person {
    name: "Alice".into(),
    job_title: "cryptographer".into(),
};

will be encoded as a list below:

["name", "Alice", "job_title", "cryptographer"]

EncodeStruct can be used to encode a struct.

§Enum

Enums are encoded as a list that contains a variant name, and then, same as for a structure, it contains fields names followed by field values associated with the variant (if any). For instance, an enum

enum Shape {
    Circle { radius: u8 },
    Square { width: u8 },
}
let circle = Shape::Circle { radius: 5 };

will be encoded as a list:

["variant", "Circle", "radius", 5_u32]

EncodeEnum can be used to encode an enum.

§Primitive types

Primitive values can be encoded as bytestrings as long as they can be unambiguously converted to bytes. For instance, strings are trivially converted to bytes via str::as_bytes.

Unsigned integers are encoded in big-endian representation truncated from leading zeroes. It allows us to unambiguously encode usize on different machines even if they have different size of machine word.

Signed integers are encoded as concatenation of their sign and byte representation of their absolute value.

§Domain separation

When value is encoded into bytes, it loses its type. For instance, “abcd” bytestring may correspond to Vec<u8>, String, u32 and so on. When it’s required to distinguish one type from another, domain separation tag can be used.

Domain separation tag can be specified for any value using .with_tag() method.

It’s recommended to specify the tag only for high-level structures.

§Encoding lists into bytes

Generally, when a value is encoded, firstly we write its byte representation and then append its metadata like length and type (leaf of list). Writing a length after the value makes it possible to encode byte strings and lists the length of which is not known in advance.

Any value is encoded according to this grammar specification:

value    ::= leaf | leaf_ctx | list | list_ctx

leaf     ::= bytestring len(bytestring) LEAF
leaf_ctx ::= bytestring len(bytestring) tag len(tag) LEAF_CTX

list     ::= [value] len([value]) LIST
list_ctx ::= [value] len([value]) ctx len(ctx) LIST_CTX

len(n) ::=
  if n.len() <= u32::MAX {
    (n.len() as u32) LEN_32
  } else {
    let len_n = n.len().to_be_bytes().strip();
    assert!(len_n.len() < 256);
    len_n (len_n.len() as u8) BIGLEN
  }

LIST     ::= 1
LIST_CTX ::= 2
LEAF     ::= 3
LEAF_CTX ::= 4
LEN_32   ::= 5
BIGLEN   ::= 6

§Example

A structured data below

struct Person {
    name: String,
    skills: Vec<String>,
    job_title: String,
}
let alice = Person {
    name: "Alice".into(),
    skills: vec!["math".into(), "crypto".into()],
    job_title: "cryptographer".into(),
};

will be encoded as a list:

["name", "Alice", "skills", ["math", "crypto"], "job_title", "cryptographer"]

which will be translated into the bytes:

// Writes "name", "Alice" onto the stack
"name"  4_u32 LEN_32 LEAF
"Alice" 5_u32 LEN_32 LEAF
// Writes her skills onto the stack
"skills" 6_u32 LEN_32 LEAF
"math"   4_u32 LEN_32 LEAF
"crypto" 6_u32 LEN_32 LEAF
// Merges the last 2 elements from the stack into a list
2_u32 LEN_32 LIST
// Writes her job title
"job_title"     9_u32  LEN_32 LEAF
"cryptographer" 13_u32 LEN_32 LEAF
// Merges the last 6 elements from the stack into a list
6_u32 LEN_32 LIST

where LEAF, LIST, and LEN_32 are constants defined above.

Structs§

BufferDigestdigest
Wraps digest::Digest and implements Buffer
BufferUpdatedigest
Wraps digest::Update and implements Buffer
EncodeEnum
Encodes an enum
EncodeLeaf
Encodes a leaf (bytestring)
EncodeList
Encodes a list of values
EncodeStruct
Encodes a structure
EncodeValue
Encodes a value

Constants§

BIGLEN
Control symbol
LEAF
Control symbol
LEAF_CTX
Control symbol
LEN_32
Control symbol
LIST
Control symbol
LIST_CTX
Control symbol

Traits§

Buffer
A buffer that exposes append-only access

Functions§

encode_len
Encodes length of list or leaf