Module rlp

Source
Expand description

This module enables RLP encoding of high-level objects.

RLP (recursive length prefix) is a common algorithm for encoding of variable length binary data. RLP encodes data before storing on disk or transmitting via network.

§Theory

Encoding


Primary RLP can only deal with “item” type, which is defined as:

  • Byte string (Bytes) or
  • Sequence of items (Vec, fixed array or slice).

Some examples are:

  • b'\x00\xff'
  • empty list vec![]
  • list of bytes vec![vec![0u8], vec![1u8, 3u8]]
  • list of combinations vec![vec![], vec![0u8], vec![vec![0]]]

The encoded result is always a byte string (sequence of u8).

Encoding algorithm


Given x item as input, we define rlp_encode as the following algorithm:

Let concat be a function that joins given bytes into single byte sequence.

  1. If x is a single byte and 0x00 <= x <= 0x7F, rlp_encode(x) = x.
  2. Otherwise, if x is a byte string, let len(x) be length of x in bytes and define encoding as follows:
    • If 0 < len(x) < 0x38 (note that empty byte string fulfills this requirement), then
      rlp_encode(x) = concat(0x80 + len(x), x)
      In this case first byte is in range [0x80; 0xB7].
    • If 0x38 <= len(x) <= 0xFFFFFFFF, then
      rlp_encode(x) = concat(0xB7 + len(len(x)), len(x), x)
      In this case first byte is in range [0xB8; 0xBF].
    • For longer strings encoding is undefined.
  3. Otherwise, if x is a list, let s = concat(map(rlp_encode, x)) be concatenation of RLP encodings of all its items.
    • If 0 < len(s) < 0x38 (note that empty list matches), then
      rlp_encode(x) = concat(0xC0 + len(s), s)
      In this case first byte is in range [0xC0; 0xF7].
    • If 0x38 <= len(s) <= 0xFFFFFFFF, then
      rlp_encode(x) = concat(0xF7 + len(len(s)), len(s), x)
      In this case first byte is in range [0xF8; 0xFF].
    • For longer lists encoding is undefined.

See more in Ethereum wiki.

Encoding examples


xrlp_encode(x)
b''0x80
b'\x00'0x00
b'\x0F'0x0F
b'\x79'0x79
b'\x80'0x81 0x80
b'\xFF'0x81 0xFF
b'foo'0x83 0x66 0x6F 0x6F
[]0xC0
[b'\x0F']0xC1 0x0F
[b'\xEF']0xC1 0x81 0xEF
[[], [[]]]0xC3 0xC0 0xC1 0xC0

Serialization


However, in the real world, the inputs are not pure bytes nor lists. We need a way to encode numbers (like u64), custom structs, enums and other more complex machinery that exists in the surrounding code.

This library wraps fastrlp crate, so everything mentioned there about Encodable and Decodable traits still applies. You can implement those for any object to make it RLP-serializable.

However, following this approach directly results in cluttered code: your structs now have to use field types that match serialization, which may be very inconvenient.

To avoid this pitfall, this RLP implementation allows “extended” struct definition via a macro. Let’s have a look at Transaction definition:

use thor_devkit::rlp::{AsBytes, AsVec, Maybe, Bytes};
use thor_devkit::{rlp_encodable, U256};
use thor_devkit::transactions::{Clause, Reserved};

rlp_encodable! {
    /// Represents a single VeChain transaction.
    #[derive(Clone, Debug, Eq, PartialEq)]
    pub struct Transaction {
        /// Chain tag
        pub chain_tag: u8,
        pub block_ref: u64,
        pub expiration: u32,
        pub clauses: Vec<Clause>,
        pub gas_price_coef: u8,
        pub gas: u64,
        pub depends_on: Option<U256> => AsBytes<U256>,
        pub nonce: u64,
        pub reserved: Option<Reserved> => AsVec<Reserved>,
        pub signature: Option<Bytes> => Maybe<Bytes>,
    }
}

What’s going on here? First, some fields are encoded “as usual”: unsigned integers are encoded just fine and you likely won’t need any different encoding. However, some fields work in a different way. depends_on is a number that may be present or absent, and it should be encoded as a byte sting. U256 is already encoded this way, but None is not (Option is not RLP-serializable on itself). So we wrap it in a special wrapper: AsBytes. AsBytes<T> will serialize Some(T) as T and None as an empty byte string.

reserved is a truly special struct that has custom encoding implemented for it. That implementation serializes Reserved into a Vec<Bytes>, and then serializes this Vec<Bytes> to the output stream. If it is empty, an empty vector should be written instead. This is achieved via AsVec annotation.

Maybe is a third special wrapper. Fields annotated with Maybe may only be placed last (otherwise encoding is ambiguous), and with Maybe<T> Some(T) is serialized as T and None — as nothing (zero bytes added).

Fields comments are omitted here for brevity, they are preserved as well.

This macro adds both decoding and encoding capabilities. See examples folder for more examples of usage, including custom types and machinery.

Note that this syntax is not restricted to these three wrappers, you can use any types with proper From implementation:

use thor_devkit::rlp_encodable;

#[derive(Clone)]
struct MySeries {
    left: [u8; 2],
    right: [u8; 2],
}

impl From<MySeries> for u32 {
    fn from(value: MySeries) -> Self {
        Self::from_be_bytes(value.left.into_iter().chain(value.right).collect::<Vec<_>>().try_into().unwrap())
    }
}
impl From<u32> for MySeries {
    fn from(value: u32) -> Self {
        let [a, b, c, d] = value.to_be_bytes();
        Self{ left: [a, b], right: [c, d] }
    }
}

rlp_encodable! {
    pub struct Foo {
        pub foo: MySeries => u32,
    }
}

Structs§

Bytes
A cheaply cloneable and sliceable chunk of contiguous memory.
BytesMut
A unique reference to a contiguous slice of memory.
Header

Enums§

AsBytes
Serialization wrapper for Option to serialize None as empty Bytes.
AsVec
Serialization wrapper for Option to serialize None as empty Vec.
Maybe
Serialization wrapper for Option to serialize None as nothing (do not modify output stream). This must be the last field in the struct.
RLPError

Traits§

Buf
Read bytes from a buffer.
BufMut
A trait for values that provide sequential write access to bytes.
Decodable
Encodable

Type Aliases§

RLPResult
Convenience alias for a result of fallible RLP decoding.