1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
// Copyright (c) The Diem Core Contributors
// SPDX-License-Identifier: Apache-2.0

#![forbid(unsafe_code)]

//! # Binary Canonical Serialization (BCS)
//!
//! BCS (formerly "Libra Canonical Serialization" or LCS) is a serialization format developed
//! in the context of the [Diem](https://diem.com) blockchain.
//!
//! BCS was designed with the following main goals in mind:
//! * provide good performance and concise (binary) representations;
//! * support a rich set of data types commonly used in Rust;
//! * enforce canonical serialization, meaning that every value of a given type should have
//! a single valid representation.
//!
//! BCS also aims to mitigate the consequence of malicious inputs by enforcing well-defined limits
//! on large or nested containers during (de)serialization.
//!
//! ## Rust Implementation
//!
//! This crate provides a Rust implementation of BCS as an encoding format for the [Serde library](https://serde.rs).
//! As such, this implementation covers most data types supported by Serde -- including user-defined structs,
//! tagged variants (Rust enums), tuples, and maps -- excluding floats, single unicode characters (char), and sets.
//!
//! BCS is also available in other programming languages, thanks to the separate project [serde-reflection](https://github.com/novifinancial/serde-reflection).
//!
//! ## Application to Cryptography
//!
//! The BCS format guarantees canonical serialization, meaning that for any given data type, there
//! is a one-to-one correspondance between in-memory values and valid byte representations.
//!
//! In the context of a cryptographic application, canonical serialization has several benefits:
//! * It provides a natural and reliable way to associate in-memory values to cryptographic hashes.
//! * It allows the signature of a message to be defined equivalently as the signature of the serialized bytes or as the signature of the in-memory value.
//!
//! Note that BCS ensures canonical serialization for each data type separately. The data type of a serialized value
//! must be enforced by the application itself. This requirement is typically fulfilled
//! using unique hash seeds for each data type. (See [Diem's cryptographic library](https://github.com/diem/diem/blob/master/crypto/crypto/src/hash.rs) for an example.)
//!
//! ## Backwards Compatibility
//!
//! By design, BCS does not provide implicit versioning or backwards/forwards compatibility, therefore
//! applications must carefully plan in advance for adhoc extension points:
//! * Enums may be used for explicit versioning and backward compatibility (e.g. extensible query interfaces).
//! * In some cases, data fields of type `Vec<u8>` may also be added to allow (future) unknown payloads
//! in serialized form.
//!
//! ## Detailed Specifications
//!
//! BCS supports the following data types:
//!
//! * Booleans
//! * Signed 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit integers
//! * Unsigned 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit integers
//! * Option
//! * Unit (an empty value)
//! * Fixed and variable length sequences
//! * UTF-8 Encoded Strings
//! * Tuples
//! * Structures (aka "structs")
//! * Externally tagged enumerations (aka "enums")
//! * Maps
//!
//! BCS is not a self-describing format. As such, in order to deserialize a message, one must
//! know the message type and layout ahead of time.
//!
//! Unless specified, all numbers are stored in little endian, two's complement format.
//!
//! ### Recursion and Depth of BCS Data
//!
//! Recursive data-structures (e.g. trees) are allowed. However, because of the possibility of stack
//! overflow during (de)serialization, the *container depth* of any valid BCS data cannot exceed the constant
//! `MAX_CONTAINER_DEPTH`. Formally, we define *container depth* as the number of structs and enums traversed
//! during (de)serialization.
//!
//! This definition aims to minimize the number of operations while ensuring that
//! (de)serialization of a known BCS format cannot cause arbitrarily large stack allocations.
//!
//! As an example, if `v1` and `v2` are values of depth `n1` and `n2`,
//! * a struct value `Foo { v1, v2 }` has depth `1 + max(n1, n2)`;
//! * an enum value `E::Foo { v1, v2 }` has depth `1 + max(n1, n2)`;
//! * a pair `(v1, v2)` has depth `max(n1, n2)`;
//! * the value `Some(v1)` has depth `n1`.
//!
//! All string and integer values have depths `0`.
//!
//! ### Booleans and Integers
//!
//! |Type                       |Original data          |Hex representation |Serialized bytes        |
//! |---                        |---                    |---                |---                     |
//! |Boolean                    |True / False           |0x01 / 0x00        |01 / 00                 |
//! |8-bit signed integer       |-1                     |0xFF               |FF                      |
//! |8-bit unsigned integer     |1                      |0x01               |01                      |
//! |16-bit signed integer      |-4660                  |0xEDCC             |CC ED                   |
//! |16-bit unsigned integer    |4660                   |0x1234             |34 12                   |
//! |32-bit signed integer      |-305419896             |0xEDCBA988         |88 A9 CB ED             |
//! |32-bit unsigned integer    |305419896              |0x12345678         |78 56 34 12             |
//! |64-bit signed integer      |-1311768467750121216   |0xEDCBA98754321100 |00 11 32 54 87 A9 CB ED |
//! |64-bit unsigned integer    |1311768467750121216    |0x12345678ABCDEF00 |00 EF CD AB 78 56 34 12 |
//!
//! ### ULEB128-Encoded Integers
//!
//! The BCS format also uses the [ULEB128 encoding](https://en.wikipedia.org/wiki/LEB128) internally
//! to represent unsigned 32-bit integers in two cases where small values are usually expected:
//! (1) lengths of variable-length sequences and (2) tags of enum values (see the corresponding
//! sections below).
//!
//! |Type                       |Original data          |Hex representation |Serialized bytes   |
//! |---                        |---                    |---                |---                |
//! |ULEB128-encoded u32-integer|2^0 = 1                |0x00000001         |01                 |
//! |                           |2^7 = 128              |0x00000080         |80 01              |
//! |                           |2^14 = 16384           |0x00004000         |80 80 01           |
//! |                           |2^21 = 2097152         |0x00200000         |80 80 80 01        |
//! |                           |2^28 = 268435456       |0x10000000         |80 80 80 80 01     |
//! |                           |9487                   |0x0000250f         |8f 4a              |
//!
//! In general, a ULEB128 encoding consists of a little-endian sequence of base-128 (7-bit)
//! digits. Each digit is completed into a byte by setting the highest bit to 1, except for the
//! last (highest-significance) digit whose highest bit is set to 0.
//!
//! In BCS, the result of decoding ULEB128 bytes is required to fit into a 32-bit unsigned
//! integer and be in canonical form. For instance, the following values are rejected:
//! * 80 80 80 80 80 01 (2^36) is too large.
//! * 80 80 80 80 10 (2^33) is too large.
//! * 80 00 is not a minimal encoding of 0.
//!
//! ### Optional Data
//!
//! Optional or nullable data either exists in its full representation or does not. BCS represents
//! this as a single byte representing the presence `0x01` or absence `0x00` of data. If the data
//! is present then the serialized form of that data follows. For example:
//!
//! ```rust
//! # use bcs::{Result, to_bytes};
//! # fn main() -> Result<()> {
//! let some_data: Option<u8> = Some(8);
//! assert_eq!(to_bytes(&some_data)?, vec![1, 8]);
//!
//! let no_data: Option<u8> = None;
//! assert_eq!(to_bytes(&no_data)?, vec![0]);
//! # Ok(())}
//! ```
//!
//! ### Fixed and Variable Length Sequences
//!
//! Sequences can be made of up of any BCS supported types (even complex structures) but all
//! elements in the sequence must be of the same type. If the length of a sequence is fixed and
//! well known then BCS represents this as just the concatenation of the serialized form of each
//! individual element in the sequence. If the length of the sequence can be variable, then the
//! serialized sequence is length prefixed with a ULEB128-encoded unsigned integer indicating
//! the number of elements in the sequence. All variable length sequences must be
//! `MAX_SEQUENCE_LENGTH` elements long or less.
//!
//! ```rust
//! # use bcs::{Result, to_bytes};
//! # fn main() -> Result<()> {
//! let fixed: [u16; 3] = [1, 2, 3];
//! assert_eq!(to_bytes(&fixed)?, vec![1, 0, 2, 0, 3, 0]);
//!
//! let variable: Vec<u16> = vec![1, 2];
//! assert_eq!(to_bytes(&variable)?, vec![2, 1, 0, 2, 0]);
//!
//! let large_variable_length: Vec<()> = vec![(); 9_487];
//! assert_eq!(to_bytes(&large_variable_length)?, vec![0x8f, 0x4a]);
//! # Ok(())}
//! ```
//!
//! ### Strings
//!
//! Only valid UTF-8 Strings are supported. BCS serializes such strings as a variable length byte
//! sequence, i.e. length prefixed with a ULEB128-encoded unsigned integer followed by the byte
//! representation of the string.
//!
//! ```rust
//! # use bcs::{Result, to_bytes};
//! # fn main() -> Result<()> {
//! // Note that this string has 10 characters but has a byte length of 24
//! let utf8_str = "çå∞≠¢õß∂ƒ∫";
//! let expecting = vec![
//!     24, 0xc3, 0xa7, 0xc3, 0xa5, 0xe2, 0x88, 0x9e, 0xe2, 0x89, 0xa0, 0xc2,
//!     0xa2, 0xc3, 0xb5, 0xc3, 0x9f, 0xe2, 0x88, 0x82, 0xc6, 0x92, 0xe2, 0x88, 0xab,
//! ];
//! assert_eq!(to_bytes(&utf8_str)?, expecting);
//! # Ok(())}
//! ```
//!
//! ### Tuples
//!
//! Tuples are typed composition of objects: `(Type0, Type1)`
//!
//! Tuples are considered a fixed length sequence where each element in the sequence can be a
//! different type supported by BCS. Each element of a tuple is serialized in the order it is
//! defined within the tuple, i.e. [tuple.0, tuple.2].
//!
//! ```rust
//! # use bcs::{Result, to_bytes};
//! # fn main() -> Result<()> {
//! let tuple = (-1i8, "diem");
//! let expecting = vec![0xFF, 4, b'd', b'i', b'e', b'm'];
//! assert_eq!(to_bytes(&tuple)?, expecting);
//! # Ok(())}
//! ```
//!
//!
//! ### Structures
//!
//! Structures are fixed length sequences consisting of fields with potentially different types.
//! Each field within a struct is serialized in the order specified by the canonical structure
//! definition. Structs can exist within other structs and as such, BCS recurses into each struct
//! and serializes them in order. There are no labels in the serialized format, the struct ordering
//! defines the organization within the serialization stream.
//!
//! ```rust
//! # use bcs::{Result, to_bytes};
//! # use serde::Serialize;
//! # fn main() -> Result<()> {
//! #[derive(Serialize)]
//! struct MyStruct {
//!     boolean: bool,
//!     bytes: Vec<u8>,
//!     label: String,
//! }
//!
//! #[derive(Serialize)]
//! struct Wrapper {
//!     inner: MyStruct,
//!     name: String,
//! }
//!
//! let s = MyStruct {
//!     boolean: true,
//!     bytes: vec![0xC0, 0xDE],
//!     label: "a".to_owned(),
//! };
//! let s_bytes = to_bytes(&s)?;
//! let mut expecting = vec![1, 2, 0xC0, 0xDE, 1, b'a'];
//! assert_eq!(s_bytes, expecting);
//!
//! let w = Wrapper {
//!     inner: s,
//!     name: "b".to_owned(),
//! };
//! let w_bytes = to_bytes(&w)?;
//! assert!(w_bytes.starts_with(&s_bytes));
//!
//! expecting.append(&mut vec![1, b'b']);
//! assert_eq!(w_bytes, expecting);
//! # Ok(())}
//! ```
//!
//! ### Externally Tagged Enumerations
//!
//! An enumeration is typically represented as a type that can take one of potentially many
//! different variants. In BCS, each variant is mapped to a variant index, a ULEB128-encoded 32-bit unsigned
//! integer, followed by serialized data if the type has an associated value. An
//! associated type can be any BCS supported type. The variant index is determined based on the
//! ordering of the variants in the canonical enum definition, where the first variant has an index
//! of `0`, the second an index of `1`, etc.
//!
//! ```rust
//! # use bcs::{Result, to_bytes};
//! # use serde::Serialize;
//! # fn main() -> Result<()> {
//! #[derive(Serialize)]
//! enum E {
//!     Variant0(u16),
//!     Variant1(u8),
//!     Variant2(String),
//! }
//!
//! let v0 = E::Variant0(8000);
//! let v1 = E::Variant1(255);
//! let v2 = E::Variant2("e".to_owned());
//!
//! assert_eq!(to_bytes(&v0)?, vec![0, 0x40, 0x1F]);
//! assert_eq!(to_bytes(&v1)?, vec![1, 0xFF]);
//! assert_eq!(to_bytes(&v2)?, vec![2, 1, b'e']);
//! # Ok(())}
//! ```
//!
//! If you need to serialize a C-style enum, you should use a primitive integer type.
//!
//! ### Maps (Key / Value Stores)
//!
//! Maps are represented as a variable-length, sorted sequence of (Key, Value) tuples. Keys must be
//! unique and the tuples sorted by increasing lexicographical order on the BCS bytes of each key.
//! The representation is otherwise similar to that of a variable-length sequence. In particular,
//! it is preceded by the number of tuples, encoded in ULEB128.
//!
//! ```rust
//! # use bcs::{Result, to_bytes};
//! # use std::collections::HashMap;
//! # fn main() -> Result<()> {
//! let mut map = HashMap::new();
//! map.insert(b'e', b'f');
//! map.insert(b'a', b'b');
//! map.insert(b'c', b'd');
//!
//! let expecting = vec![(b'a', b'b'), (b'c', b'd'), (b'e', b'f')];
//!
//! assert_eq!(to_bytes(&map)?, to_bytes(&expecting)?);
//! # Ok(())}
//! ```

mod de;
mod error;
mod ser;
pub mod test_helpers;

/// Variable length sequences in BCS are limited to max length of 2^31 - 1.
pub const MAX_SEQUENCE_LENGTH: usize = (1 << 31) - 1;

/// Maximal allowed depth of BCS data, counting only structs and enums.
pub const MAX_CONTAINER_DEPTH: usize = 500;

pub use de::{from_bytes, from_bytes_seed};
pub use error::{Error, Result};
pub use ser::{is_human_readable, serialize_into, serialized_size, to_bytes};