Crate fourleaf [] [src]

fourleaf is a simple, efficient, and reasonably compact format and library for serialising Rust values.

Introduction

Features

  • Non-allocating serialisation and deserialisation.

  • Byte slices can be borrowed from the input instead of copied.

  • Explicit tagging makes both backward- and forward-compatibility easy to maintain as desired.

  • Support for in-band padding, errors, or other signalling.

Why use fourleaf?

  • You want a binary data format, so JSON/TOML/etc is out.

  • You need to serialise large blobs, so CBOR is out.

  • You want to avoid copying large blobs, which requires library support (e.g., not available in serde).

  • You want to serialise/deserialise mundane Rust data types, so protobufs / flatbuffers are out.

  • You want fine control over compatibility.

  • You want to make a stream protocol without needing an extra framing mechanism.

Why not to use fourleaf

  • You want a self-describing data format. fourleaf requires the reader to already have a good idea of what it is deserialising. If pre-agreed schemas are not available, fourleaf likely isn't the right choice.

  • You want support in something other than Rust. There are no fourleaf decoders available for other languages nor any plans to write one any time soon (though doing so likely wouldn't be too difficult).

  • You already have serde working on your data and want to keep it that way. fourleaf does not (and cannot) integrate with serde.

Getting Started

First, we need some data structures we want to [de]serialise. For the example here, we'll stick with a couple relatively basic things.

#[derive(Debug, PartialEq)]
struct Widget {
  name: String,
  manufacturer: Option<String>,
  count: u64,
}

#[derive(Debug, PartialEq)]
enum Order {
  Purchase(Vec<Widget>),
  Notice(String),
}

The Serialize and Deserialize traits are used to control fourleaf serialisation and deserialisation, respectively. Implementing these by hand is extremely tedious, but you can easily use the fourleaf_retrofit! macro to generate them from a concise definition. Note that this is a declaration separate from the data types themselves, and so is unfortunately a bit redundant. A more concise macro may be added in a future version of fourleaf.

When defining the format for a struct or enum variant, you need to choose a "tag" for each field. A tag is an integer between 1 and 63, inclusive, which is how the field is identified in the binary format. For an enum, each variant must also be assigned a numeric discriminant, which may be an arbitrary u64. Obviously, each field in the same structure must have a unique tag, and each variant in an enum must have a unique discriminant.

See the fourleaf_retrofit! macro documentation for full information on how that macro works.

Here's how the definition for our data above might look:

#[macro_use] extern crate fourleaf;

fourleaf_retrofit!(struct Widget : {} {} {
  |_context, this|
  [1] name: String = &this.name,
  [2] manufacturer: Option<String> = &this.manufacturer,
  [3] count: u64 = this.count,
  { Ok(Widget { name: name, manufacturer: manufacturer, count: count }) }
});

fourleaf_retrofit!(enum Order : {} {} {
  |_context|
  [1] Order::Purchase(ref widgets) => {
    [1] widgets: Vec<Widget> = widgets,
    { Ok(Order::Purchase(widgets)) }
  },
  [2] Order::Notice(ref text) => {
    [1] text: String = text,
    { Ok(Order::Notice(text)) }
  },
});

And that's it! These structures can now be subject to fourleaf serialisation.

#[macro_use] extern crate fourleaf;
let defunct_widget = Widget { name: "Defunct".to_owned(),
                              manufacturer: None,
                              count: 42 };
let serialised = fourleaf::to_vec(&defunct_widget).unwrap();
assert_eq!(b"\x81\x07Defunct\x43\x2A\x00", &serialised[..]);
// Type annotation not required in this case, but included for clarity.
let deserialised = fourleaf::from_slice_copy::<Widget>(
  &serialised, &fourleaf::DeConfig::default()).unwrap();
assert_eq!(defunct_widget, deserialised);

let modern_widget = Widget { name: "Modern".to_owned(),
                             manufacturer: Some("Widgedyne".to_owned()),
                             count: 5 };
let serialised = fourleaf::to_vec(&modern_widget).unwrap();
assert_eq!(b"\x81\x06Modern\x82\x09Widgedyne\x43\x05\x00",
           &serialised[..]);
let deserialised = fourleaf::from_slice_copy::<Widget>(
  &serialised, &fourleaf::DeConfig::default()).unwrap();
assert_eq!(modern_widget, deserialised);

let order = Order::Notice("nothing today".to_owned());
let serialised = fourleaf::to_vec(&order).unwrap();
assert_eq!(b"\x01\x02\x81\x0Dnothing today\x00\x00",
           &serialised[..]);
let deserialised = fourleaf::from_slice_copy::<Order>(
  &serialised, &fourleaf::DeConfig::default()).unwrap();
assert_eq!(order, deserialised);

Notice the configuration that is passed in to the deserialisation functions. By default, fourleaf uses fairly conservative limits on struct recursion and allocations made for things like Vecs and Strings. If you have deeply nested structures, large collections, or large strings, you may need to adjust the configuration as desired.

The fourleaf format

The fourleaf format is built around exactly four types:

  • Arbitrary-width integers. (But note that the current implementation is limited to 64 bits.)

  • Blobs (i.e., arbitrary byte arrays).

  • Structs, or sequences of tag/value pairs terminated with an EndOfStruct marker.

  • Enums, essentially structs prefixed with an integer discriminant.

Notably absent from this list is "null" or collections of any kind. This is because fourleaf essentially models every struct field as being a collection in and of itself by repeating the field as many times as needed; e.g., a plain u32 simply restricts that collection to be exactly one element, whereas Option<u32> allows it to be zero or one, and Vec<u32> allows arbitrary repitition.

Because of this, the exact way a type is serialised is somewhat context-sensitive. There are three general contexts:

  • "Struct body", which is also the top level. Things which are serialised as structs are written without any kind of header; other things get wrapped in a single-field struct.

  • "Struct field", where a value is directly contained within a struct. Here, collections are flattened as described above.

  • "Collection element", where a value must be represented as exactly one tag/value pair. In general, types which always serialise to exactly one tag/value pair behave the same as in the "struct field" context, but collections and so forth wrap themselves in a struct which contains all their values.

To illustrate, let's start with a simple structure.

struct S(u32, Option<u32>, Vec<u32>);

fourleaf_retrofit!(struct S : {} {} {
  |_context, this|
  [1] a: u32 = this.0,
  [2] b: Option<u32> = this.1,
  [3] c: Vec<u32> = &this.2,
  { Ok(S(a, b, c)) }
});

If we serialise the value S(42, None, vec![]), we get the following:

41 2a       ; Field tag=1 type=integer value=42
00          ; End of struct

Notice that there is no "start of struct" at the top level. Note also that fields b and c are totally unrepresented in the serialised form. Since Option and Vec are both treated as collections, and a collection with n elements is represented as n repititions of the field, there are thus 0 repittions of the field. If we instead populate everything, for example with S(42, Some(1), vec![2, 3]), we get

41 2a       ; Field tag=1 type=integer value=42
42 01       ; Field tag=2 type=integer value=1
43 02       ; Field tag=3 type=integer value=2
43 03       ; Field tag=3 type=integer value=3
00          ; End of struct

We can see here that field c was simply handled by writing two instances of the field without any wrapping. That is because the Vec<u32> is in "struct field" context.

This flat representation obviously can't work when collections are nested, since there would be no way to recreate the nesting. This is why "collection element" context exists. We see it, for example, if we serialise vec![Some(42u32), None]:

c1          ; field tag=1 type=struct (element of vec)
  41 2a     ; field tag=1 type=integer value=42
  00        ; end of struct (element of vec)
c1          ; field tag=1 type=struct (element of vec)
  00        ; end of struct (element of vec)
00          ; end of struct (top-level)

There are two interesting things here. First, since Vec finds itself at top-level, it is in "struct body" context, and so serialises as if it were a field of tag 1 in a struct containing just that field. Second, because Option is inside a collection, it instead nests itself inside a struct in a similar way. In the case of None, this inner struct ends up being completely empty.

Built-in types

fourleaf ships with built-in support for a large portion of std. Particularly, it aims to support everything that serde does out-of-the-box. A notable exception right now are the floating-point types, which do not currently have a defined fourleaf representation.

All integer types serialise to integers. Signed integers are ZigZagged rather than sign-extended. bool is treated as an integer which is either 0 or 1.

PhantomData serialises to integer 0.

Slices, Vec, VecDeque, LinkedList, BinaryHeap, BTreeSet, and HashSet serialise the way collections were described above, except that &[u8] and Vec<u8> serialise to blobs instead of collections of integers. (Other collections have no special behaviour for u8.)

Option<T> serialises as a collection of T.

Arrays of size 0 to 32, as well as all powers of 2 up to 24, serialise the same way as the slices of the same type; but note that deserialising slices larger than 32 elements requires the elements to be both Copy and Default. This includes the special behaviours of u8.

HashMap<K,V> and BTreeMap<K,V> serialise the same way as [(K,V)].

Tuples with 0 to 15 elements, inclusive, serialise as structs with sequential tags for each field starting from 1.

String and &str serialise to blobs.

&T, &mut T, Box<T>, Rc<T>, and Arc<T> serialise the same way as T.

A number of other std types are supported; see retrofit.rs in the repository for their exact definitions.

Zero-Copy Support

The types &[u8] and &str must, and Cow of the same things can, be used in "zero-copy" mode. In zero-copy mode, the deserialised values will reference the input buffer itself instead of being copied, which is obviously faster and requires less memory, but does make management more difficult and requires buffering the whole input first. In the case of Cow, this behaviour is selectable via the _copy vs _borrow functions, or the STYLE generic parameter to Deserialize when using the trait directly.

#[macro_use] extern crate fourleaf;
use std::borrow::Cow;
use std::io::Read;

#[derive(Debug, PartialEq)]
struct ZeroCopyOnly<'a> {
  s: &'a str,
}

#[derive(Debug, PartialEq)]
struct EitherMode<'a> {
  s: Cow<'a, str>,
}

fourleaf_retrofit!(struct ZeroCopyOnly<'a> : {
  impl<'a> fourleaf::Serialize for ZeroCopyOnly<'a>
} {
  impl<'a, R : Read, STYLE> fourleaf::Deserialize<R, STYLE>
  for ZeroCopyOnly<'a> where &'a str: fourleaf::Deserialize<R, STYLE>
} {
  |_context, this|
  [1] s: &'a str = this.s,
  { Ok(ZeroCopyOnly { s: s }) }
});
fourleaf_retrofit!(struct EitherMode<'a> : {
  impl<'a> fourleaf::Serialize for EitherMode<'a>
} {
  impl<'a, R : Read, STYLE> fourleaf::Deserialize<R, STYLE>
  for EitherMode<'a> where Cow<'a,str>: fourleaf::Deserialize<R, STYLE>
} {
  |_context, this|
  [1] s: Cow<'a,str> = this.s,
  { Ok(EitherMode { s: s }) }
});

// Some data we want to deserialise. It needs to be in a contiguous buffer.
// Here we put it in a `Vec` and borrow that to demonstrate that this
// works without a `'static` buffer.
let data = b"\x81\x0Bhello world\x00".to_owned();
let data = &data[..];
// Do a zero-copy parse of data.
let value = fourleaf::from_slice_borrow::<ZeroCopyOnly>(
  data, &fourleaf::DeConfig::default()).unwrap();
assert_eq!(ZeroCopyOnly { s: "hello world" }, value);
// Not only is it the expected value, but the string is also pointing into
// `data`.
assert_eq!(&data[2..13] as *const [u8], value.s.as_bytes() as *const [u8]);

// This line would not compile, because `&[u8]` does not support `Copying`
// mode since it has no place to copy to.
// let value = fourleaf::from_slice_copy::<ZeroCopyOnly>( // Compile error
//   data, &fourleaf::DeConfig::default()).unwrap();

// With `Cow`, we also can do zero-copy.
let value = fourleaf::from_slice_borrow::<EitherMode>(
  data, &fourleaf::DeConfig::default()).unwrap();
assert_eq!(EitherMode { s: Cow::Borrowed("hello world") }, value);
assert_eq!(&data[2..13] as *const [u8], value.s.as_bytes() as *const [u8]);

// And `Cow` supports copying mode as well. This also lets us use a
// `'static` lifetime since the life of the result does not depend on the
// life of the input.
let value = fourleaf::from_slice_copy::<EitherMode<'static>>(
  data, &fourleaf::DeConfig::default()).unwrap();
match value.s {
  Cow::Owned(ref s) => assert_eq!("hello world", s),
  _ => panic!("Didn't copy"),
}

Maintaining Compatibility

A large focus of fourleaf — both the format and the implementation — was the ability to maintain compatibility between older and newer software. Compatibility comes down to three aspects:

  • Backward-compatibility; whether a newer software version can understand values written by an older version.

  • Forward-compatibility; whether an older software version can, to a a reasonable extent, handle values written by a newer version.

  • Edit-compatibility; whether a program can perform read-modify-write operations on the subset of serialised data it understands without destroying serialised data it does not understand.

Backwards-compatibility

The set of possible changes to a type which are backwards-compatible mostly flow naturally from the serialised format.

  • Adding a field is backwards-compatible as long as the type accepts a cardinality of 0 (eg, Option, collections).

  • Widening an integer type is backwards-compatible.

  • Narrowing an integer type is backwards-compatible with the subset of values which still fall within the new range.

  • Changing the signedness of an integer type is not backwards-compatible.

  • Changing a non-collection type field to a collection of the original type is backwards-compatible (but the same change at top-level or within a collection is not).

  • Widening the set of acceptable cardinalities for a collection is backwards-compatible.

  • Deleting a field is backwards-compatible as long as ignore_unknown_fields is left enabled or the container has an unknown field handler.

  • Adding an enum variant is backwards-compatible.

In many cases, it is possible to "paper over" compatibility concerns entirely in the code in fourleaf_retrofit!. For example:

#[macro_use] extern crate fourleaf;

mod v1 {
  pub struct Message {
    pub target: u64
  }
  fourleaf_retrofit!(struct Message : {} {} {
    |_context, this|
    [1] target: u64 = this.target,
    { Ok(Message { target: target }) }
  });
}

mod v2 {
  pub struct Message {
    pub target: u64,
    // New in version 2: mandatory flag
    pub frobnicate: bool,
  }
  fourleaf_retrofit!(struct Message : {} {} {
    |_context, this|
    [1] target: u64 = this.target,
    // Version 1 did not include this field
    [2] frobnicate: Option<bool> = Some(this.frobnicate),
    { Ok(Message { target: target,
                   frobnicate: frobnicate.unwrap_or(false) }) }
  });
}

// Write a message with the V1 schema...
let old_message = fourleaf::to_vec(v1::Message { target: 42 }).unwrap();
// .. and then decode it with the V2 schema.
let message = fourleaf::from_slice_copy::<v2::Message>(
  &old_message, &fourleaf::DeConfig::default()).unwrap();

assert_eq!(42, message.target);
// Code outside of deserialisation doesn't need to care about the
// compatibility issue.
assert!(!message.frobnicate);

Forwards-compatibility

Forwards-compatibility is largely the reverse of backwards-compatibility; i.e., the change from version 1 to version 2 is forwards-compatible if a change from version 2 to version 1 would be backwards-compatible.

Forwards-compatibility can be more difficult, though, since compatibility workarounds must be done in the serialisation side of the new version.

Edit-compatibility

In some cases, it is desirable to allow older versions to manipulate data written by newer versions while preserving things they don't understand. This can be accomplished via a catch-all "unknown fields" field on structs, and an "unknown variant" variant on enums. Beware that unlike other compatibility concerns, this cannot be confined to [de]serialisation logic; handling of unknowns becomes somewhat pervasive since it must be refletcted in the underyling types.

Here is an example demonstrating both features:

#[macro_use] extern crate fourleaf;

mod v1 {
  use fourleaf;
  use fourleaf::adapt::Copied;

  pub enum Operation {
    Create,
    Delete,
    // For future expansion, if a new enum variant is added, its
    // discriminant and inner fields are stored here instead of raising an
    // error.
    Unknown(u64, fourleaf::UnknownFields<'static>),
  }

  pub struct Message {
    pub id: u32,
    pub operation: Operation,
    // Unknown fields will be saved here.
    pub unknown: fourleaf::UnknownFields<'static>,
  }

  fourleaf_retrofit!(enum Operation : {} {} {
    |_context|
    [1] Operation::Create => {
      { Ok(Operation::Create) }
    },
    [2] Operation::Delete => {
      { Ok(Operation::Delete) }
    },
    (?) Operation::Unknown(discriminant, ref fields) => {
      (=) discriminant: u64 = discriminant,
      (?) fields: Copied<fourleaf::UnknownFields<'static>> = fields,
      { Ok(Operation::Unknown(discriminant, fields.0)) }
    }
  });

  fourleaf_retrofit!(struct Message : {} {} {
    |_context, this|
    [1] id: u32 = this.id,
    [2] operation: Operation = &this.operation,
    (?) unknown: Copied<fourleaf::UnknownFields<'static>> = &this.unknown,
    { Ok(Message { id: id, operation: operation, unknown: unknown.0 }) }
  });
}


mod v2 {
  use fourleaf;
  use fourleaf::adapt::Copied;

  pub enum Operation {
    Create,
    Delete,
    // New in v2. We could also have `UnknownFields` in here, but that has
    // been elided here for clarity.
    RenameTo(u32),
    // For future expansion, if a new enum variant is added, its
    // discriminant and inner fields are stored here instead of raising an
    // error.
    Unknown(u64, fourleaf::UnknownFields<'static>),
  }

  pub struct Message {
    pub id: u32,
    pub operation: Operation,
    // New in v2
    pub frobnicate: bool,
    // Unknown fields will be saved here.
    pub unknown: fourleaf::UnknownFields<'static>,
  }

  fourleaf_retrofit!(enum Operation : {} {} {
    |_context|
    [1] Operation::Create => {
      { Ok(Operation::Create) }
    },
    [2] Operation::Delete => {
      { Ok(Operation::Delete) }
    },
    [3] Operation::RenameTo(id) => {
      [1] id: u32 = id,
      { Ok(Operation::RenameTo(id)) }
    },
    (?) Operation::Unknown(discriminant, ref fields) => {
      (=) discriminant: u64 = discriminant,
      (?) fields: Copied<fourleaf::UnknownFields<'static>> = fields,
      { Ok(Operation::Unknown(discriminant, fields.0)) }
    }
  });

  fourleaf_retrofit!(struct Message : {} {} {
    |_context, this|
    [1] id: u32 = this.id,
    [2] operation: Operation = &this.operation,
    [3] frobnicate: Option<bool> = this.frobnicate,
    (?) unknown: Copied<fourleaf::UnknownFields<'static>> = &this.unknown,
    { Ok(Message { id: id, operation: operation,
                   frobnicate: frobnicate.unwrap_or(false),
                   unknown: unknown.0 }) }
  });
}

let mut config = fourleaf::DeConfig::default();
// Fail deserialisation if we would destroy anything.
config.ignore_unknown_fields = false;

// A v2 program creates a `Message` and serialises it.
let data = fourleaf::to_vec(v2::Message {
  id: 42,
  frobnicate: true,
  operation: v2::Operation::RenameTo(56),
  unknown: Default::default(),
}).unwrap();

// Now a v1 program reads it in and edits a field and reserialises it.
let mut val = fourleaf::from_slice_copy::<v1::Message>(&data, &config)
    .unwrap();
assert_eq!(42, val.id);
val.id = 99;
let data = fourleaf::to_vec(val).unwrap();

// Finally, another v2 program reads that in. The non-v1 things are still
// preserved.
let val = fourleaf::from_slice_copy::<v2::Message>(&data, &config)
    .unwrap();
assert_eq!(99, val.id);
assert!(val.frobnicate);
match val.operation {
  v2::Operation::RenameTo(56) => (),
  _ => panic!("wrong operation"),
}

Limitations

This Implementation

Integers wider than 64 bits are not supported.

Inputs longer than 16 EB are not supported. Some operations are not supported on streams longer than 8 EB. Structs/enums cannot be nested more than 16 quintillion levels deep.

The high-level deserialisation mechanism will construct each declared type on the stack before moving it to its final location. I.e., a large Vec<u64> is fine, but using a [u64;16777216] is probably unwise.

Arteficial

fourleaf places arteficial limits on the data that can be deserialised via the high-level API in order to help harden programs against malicious inputs. If you run afoul of these limits, you can change them by modifying the Config object.

When copying byte slices into a buffer (e.g., Vec<u8> or String), the configuration field max_blob places a cap on the largest total size of blobs to be deserialised in this way. Larger blobs, or large numbers of smaller blobs, will result in an error. max_blob defaults to 64 kB.

When populating a collection that has an unbounded cardinality (e.g., Vec<u64>, HashMap<String, String>), an error will occur if the total length of all such collections exceeds max_collect. max_collect defaults to 256. This limit even applies to UnknownFields.

The recursion_limit configuration sets the maximum nesting depth that will be deserialised.

Physical Format

Knowing how fourleaf elements translate to bytes is not required to use fourleaf, but may make help debugging issues or writing alternate implementations.

A fourleaf stream is a sequence of elements. Each element begins with a single byte. The upper two bits of this byte are the "type", and the lower 6 bits are the "tag".

If the tag is zero, this is a special element, and the types map as follows:

  • 00 — End of struct.
  • 40 — End of document.
  • 80 — Exception. Followed by a blob.
  • C0 — Padding. Readers are usually expected to ignore padding.

If the tag is non-zero, this is a struct field. The tag identifies the field being described, and the type is one of:

  • 00 — Enum. Followed by an integer indicating the discriminant, then elements specifying fields for the enum body.

  • 40 — Integer. Followed by an integer indicating the value.

  • 80 — Blob. Followed by a blob indicating the value.

  • C0 — Struct. Followed by elements specifying fields for the struct body.

Integers are written as in protobufs. That is, an integer is written as a sequence of little-endian 7-bit fields, with the high bit of each byte set if another byte follows. Signed integers are first ZigZagged (see zigzag in wire.rs) before being written.

Readers MUST accept integers in denormalised form.

A blob is simply an integer indicating the blob length in bytes, followed by exactly that number of bytes.

Reexports

pub use self::de::Config as DeConfig;
pub use self::de::Deserialize;
pub use self::de::from_reader;
pub use self::de::from_slice_borrow;
pub use self::de::from_slice_copy;
pub use self::de::from_stream_borrow;
pub use self::de::from_stream_copy;
pub use self::ser::Serialize;
pub use self::ser::to_stream;
pub use self::ser::to_vec;
pub use self::ser::to_writer;
pub use self::stream::Stream;
pub use self::unknown::UnknownFields;

Modules

adapt

Various adaptors to control serialisation and deserialisation.

de

Defines traits and utilities for high-level deserialisation.

io

Specialised adapters for doing IO.

ser

Defines traits and utilities for high-level serialisation.

stream

Functionality for encoding and decoding a fourleaf stream in terms of tag/value pairs.

unknown

Structures for working with fourleaf streams or portions of streams with unknown contents.

wire

Low-level definitions for working with the wire format.

Macros

fourleaf_retrofit

Retrofits fourleaf support onto arbitrary types.