serde_reflection/
lib.rs

1// Copyright (c) Facebook, Inc. and its affiliates
2// SPDX-License-Identifier: MIT OR Apache-2.0
3
4#![forbid(unsafe_code)]
5
6//! This crate provides a way to extract format descriptions for Rust containers that
7//! implement the Serialize and/or Deserialize trait(s) of Serde.
8//!
9//! Format descriptions are useful in several ways:
10//! * Stored under version control, formats can be tested to prevent unintended modifications
11//! of binary serialization formats (e.g. by changing variant order).
12//! * Formats can be passed to [`serde-generate`](https://docs.rs/serde-generate)
13//! in order to generate class definitions and provide Serde-compatible binary
14//! serialization in other languages (C++, python, Java, etc).
15//!
16//! # Quick Start
17//!
18//! Very often, Serde traits are simply implemented using Serde derive macros. In this case,
19//! you may obtain format descriptions as follows:
20//! * call `trace_simple_type` on the desired top-level container definition(s), then
21//! * add a call to `trace_simple_type` for each `enum` type. (This will fix any `MissingVariants` error.)
22//!
23//! ```rust
24//! # use serde::Deserialize;
25//! # use serde_reflection::{Error, Samples, Tracer, TracerConfig};
26//! #[derive(Deserialize)]
27//! struct Foo {
28//!   bar: Bar,
29//!   choice: Choice,
30//! }
31//!
32//! #[derive(Deserialize)]
33//! struct Bar(u64);
34//!
35//! #[derive(Deserialize)]
36//! enum Choice { A, B, C }
37//!
38//! # fn main() -> Result<(), Error> {
39//! // Start the tracing session.
40//! let mut tracer = Tracer::new(TracerConfig::default());
41//!
42//! // Trace the desired top-level type(s).
43//! tracer.trace_simple_type::<Foo>()?;
44//!
45//! // Also trace each enum type separately to fix any `MissingVariants` error.
46//! tracer.trace_simple_type::<Choice>()?;
47//!
48//! // Obtain the registry of Serde formats and serialize it in YAML (for instance).
49//! let registry = tracer.registry()?;
50//! let data = serde_yaml::to_string(&registry).unwrap();
51//! assert_eq!(&data, r#"---
52//! Bar:
53//!   NEWTYPESTRUCT: U64
54//! Choice:
55//!   ENUM:
56//!     0:
57//!       A: UNIT
58//!     1:
59//!       B: UNIT
60//!     2:
61//!       C: UNIT
62//! Foo:
63//!   STRUCT:
64//!     - bar:
65//!         TYPENAME: Bar
66//!     - choice:
67//!         TYPENAME: Choice
68//! "#);
69//! # Ok(())
70//! # }
71//! ```
72//!
73//! # Features and Limitations
74//!
75//! `serde_reflection` is meant to extract formats for Rust containers (i.e. structs and
76//! enums) with "reasonable" implementations of the Serde traits `Serialize` and
77//! `Deserialize`.
78//!
79//! ## Supported features
80//!
81//! * Plain derived implementations obtained with `#[derive(Serialize, Deserialize)]` for
82//! Rust containers in the Serde [data model](https://serde.rs/data-model.html)
83//!
84//! * Customized derived implementations using Serde attributes that are compatible with
85//! binary serialization formats, such as `#[serde(rename = "Name")]`.
86//!
87//! * Hand-written implementations of `Deserialize` that are more restrictive than the
88//! derived ones, provided that `trace_value` is used during tracing to provide sample
89//! values for all such constrained types (see the detailed example below).
90//!
91//! * Mutually recursive types provided that the first variant of each enum is
92//! recursion-free. (For instance, `enum List { None, Some(Box<List>)}`.) Note that each
93//! enum must be traced separately with `trace_type` to discover all the variants.
94//!
95//! ## Unsupported idioms
96//!
97//! * Containers sharing the same base name (e.g. `Foo`) but from different modules. (Work
98//! around: use `#[serde(rename = ..)]`)
99//!
100//! * Generic types instantiated multiple times in the same tracing session. (Work around:
101//! use the crate [`serde-name`](https://crates.io/crates/serde-name) and its adapters `SerializeNameAdapter` and `DeserializeNameAdapter`.)
102//!
103//! * Attributes that are not compatible with binary formats (e.g. `#[serde(flatten)]`, `#[serde(tag = ..)]`)
104//!
105//! * Tracing type aliases. (E.g. `type Pair = (u32, u64)` will not create an entry "Pair".)
106//!
107//! * Mutually recursive types for which picking the first variant of each enum does not
108//! terminate. (Work around: re-order the variants. For instance `enum List {
109//! Some(Box<List>), None}` must be rewritten `enum List { None, Some(Box<List>)}`.)
110//!
111//! * Certain standard types such as `std::num::NonZeroU8` may not be tracked as a
112//! container and appear simply as their underlying primitive type (e.g. `u8`) in the
113//! formats. This loss of information makes it difficult to use `trace_value` to work
114//! around deserialization invariants (see example below). As a work around, you may
115//! override the default for the primitive type using `TracerConfig` (e.g. `let config =
116//! TracerConfig::default().default_u8_value(1);`).
117//!
118//! ## Security CAVEAT
119//!
120//! At this time, `HashSet<T>` and `BTreeSet<T>` are treated as sequences (i.e. vectors)
121//! by Serde.
122//!
123//! Cryptographic applications using [BCS](https:/github.com/diem/bcs) **must** use
124//! `HashMap<T, ()>` and `BTreeMap<T, ()>` instead. Using `HashSet<T>` or `BTreeSet<T>`
125//! will compile but BCS-deserialization will not enforce canonicity (meaning unique,
126//! well-ordered serialized elements in this case). In the case of `HashSet<T>`,
127//! serialization will additionally be non-deterministic.
128//!
129//! # Troubleshooting
130//!
131//! The error type used in this crate provides a method `error.explanation()` to help with
132//! troubleshooting during format tracing.
133//!
134//! # Detailed Example
135//!
136//! In the following, more complete example, we extract the Serde formats of two containers
137//! `Name` and `Person` and demonstrate how to handle a custom implementation of `serde::Deserialize`
138//! for `Name`.
139//!
140//! ```rust
141//! # use serde::{Deserialize, Serialize};
142//! use serde_reflection::{ContainerFormat, Error, Format, Samples, Tracer, TracerConfig};
143//!
144//! #[derive(Serialize, PartialEq, Eq, Debug, Clone)]
145//! struct Name(String);
146//! // impl<'de> Deserialize<'de> for Name { ... }
147//! # impl<'de> Deserialize<'de> for Name {
148//! #     fn deserialize<D>(deserializer: D) -> std::result::Result<Self, D::Error>
149//! #     where
150//! #         D: ::serde::Deserializer<'de>,
151//! #     {
152//! #         // Make sure to wrap our value in a container with the same name
153//! #         // as the original type.
154//! #         #[derive(Deserialize)]
155//! #         #[serde(rename = "Name")]
156//! #         struct InternalValue(String);
157//! #         let value = InternalValue::deserialize(deserializer)?.0;
158//! #         // Enforce some custom invariant
159//! #         if value.len() >= 2 && value.chars().all(char::is_alphabetic) {
160//! #             Ok(Name(value))
161//! #         } else {
162//! #             Err(<D::Error as ::serde::de::Error>::custom(format!(
163//! #                 "Invalid name {}",
164//! #                 value
165//! #             )))
166//! #         }
167//! #     }
168//! # }
169//!
170//! #[derive(Serialize, Deserialize, PartialEq, Eq, Debug, Clone)]
171//! enum Person {
172//!     NickName(Name),
173//!     FullName { first: Name, last: Name },
174//! }
175//!
176//! # fn main() -> Result<(), Error> {
177//! // Start a session to trace formats.
178//! let mut tracer = Tracer::new(TracerConfig::default());
179//! // Create a store to hold samples of Rust values.
180//! let mut samples = Samples::new();
181//!
182//! // For every type (here `Name`), if a user-defined implementation of `Deserialize` exists and
183//! // is known to perform custom validation checks, use `trace_value` first so that `samples`
184//! // contains a valid Rust value of this type.
185//! let bob = Name("Bob".into());
186//! tracer.trace_value(&mut samples, &bob)?;
187//! assert!(samples.value("Name").is_some());
188//!
189//! // Now, let's trace deserialization for the top-level type `Person`.
190//! // We pass a reference to `samples` so that sampled values are used for custom types.
191//! let (format, values) = tracer.trace_type::<Person>(&samples)?;
192//! assert_eq!(format, Format::TypeName("Person".into()));
193//!
194//! // As a byproduct, we have also obtained sample values of type `Person`.
195//! // We can see that the user-provided value `bob` was used consistently to pass
196//! // validation checks for `Name`.
197//! assert_eq!(values[0], Person::NickName(bob.clone()));
198//! assert_eq!(values[1], Person::FullName { first: bob.clone(), last: bob.clone() });
199//!
200//! // We have no more top-level types to trace, so let's stop the tracing session and obtain
201//! // a final registry of containers.
202//! let registry = tracer.registry()?;
203//!
204//! // We have successfully extracted a format description of all Serde containers under `Person`.
205//! assert_eq!(
206//!     registry.get("Name").unwrap(),
207//!     &ContainerFormat::NewTypeStruct(Box::new(Format::Str)),
208//! );
209//! match registry.get("Person").unwrap() {
210//!     ContainerFormat::Enum(variants) => assert_eq!(variants.len(), 2),
211//!      _ => panic!(),
212//! };
213//!
214//! // Export the registry in YAML.
215//! let data = serde_yaml::to_string(&registry).unwrap();
216//! assert_eq!(&data, r#"---
217//! Name:
218//!   NEWTYPESTRUCT: STR
219//! Person:
220//!   ENUM:
221//!     0:
222//!       NickName:
223//!         NEWTYPE:
224//!           TYPENAME: Name
225//!     1:
226//!       FullName:
227//!         STRUCT:
228//!           - first:
229//!               TYPENAME: Name
230//!           - last:
231//!               TYPENAME: Name
232//! "#);
233//! # Ok(())
234//! # }
235//! ```
236//!
237//! # Tracing Serialization with `trace_value`
238//!
239//! Tracing the serialization of a Rust value `v` consists of visiting the structural
240//! components of `v` in depth and recording Serde formats for all the visited types.
241//!
242//! ```rust
243//! # use serde_reflection::*;
244//! # use serde::Serialize;
245//! #[derive(Serialize)]
246//! struct FullName<'a> {
247//!   first: &'a str,
248//!   middle: Option<&'a str>,
249//!   last: &'a str,
250//! }
251//!
252//! # fn main() -> Result<(), Error> {
253//! let mut tracer = Tracer::new(TracerConfig::default());
254//! let mut samples = Samples::new();
255//! tracer.trace_value(&mut samples, &FullName { first: "", middle: Some(""), last: "" })?;
256//! let registry = tracer.registry()?;
257//! match registry.get("FullName").unwrap() {
258//!     ContainerFormat::Struct(fields) => assert_eq!(fields.len(), 3),
259//!     _ => panic!(),
260//! };
261//! # Ok(())
262//! # }
263//! ```
264//!
265//! This approach works well but it can only recover the formats of datatypes for which
266//! nontrivial samples have been provided:
267//!
268//! * In enums, only the variants explicitly covered by user samples will be recorded.
269//!
270//! * Providing a `None` value or an empty vector `[]` within a sample may result in
271//! formats that are partially unknown.
272//!
273//! ```rust
274//! # use serde_reflection::*;
275//! # use serde::Serialize;
276//! # #[derive(Serialize)]
277//! # struct FullName<'a> {
278//! #   first: &'a str,
279//! #   middle: Option<&'a str>,
280//! #   last: &'a str,
281//! # }
282//! # fn main() -> Result<(), Error> {
283//! let mut tracer = Tracer::new(TracerConfig::default());
284//! let mut samples = Samples::new();
285//! tracer.trace_value(&mut samples, &FullName { first: "", middle: None, last: "" })?;
286//! assert_eq!(tracer.registry().unwrap_err(), Error::UnknownFormatInContainer("FullName".to_string()));
287//! # Ok(())
288//! # }
289//! ```
290//!
291//! For this reason, we introduce a complementary set of APIs to trace deserialization of types.
292//!
293//! # Tracing Deserialization with `trace_type<T>`
294//!
295//! Deserialization-tracing APIs take a type `T`, the current tracing state, and a
296//! reference to previously recorded samples as input.
297//!
298//! ## Core Algorithm and High-Level API
299//!
300//! The core algorithm `trace_type_once<T>`
301//! attempts to reconstruct a witness value of type `T` by exploring the graph of all the types
302//! occurring in the definition of `T`. At the same time, the algorithm records the
303//! formats of all the visited structs and enum variants.
304//!
305//! For the exploration to be able to terminate, the core algorithm `trace_type_once<T>` explores
306//! each possible recursion point only once (see paragraph below).
307//! In particular, if `T` is an enum, `trace_type_once<T>` discovers only one variant of `T` at a time.
308//!
309//! For this reason, the high-level API `trace_type<T>`
310//! will repeat calls to `trace_type_once<T>` until all the variants of `T` are known.
311//! Variant cases of `T` are explored in sequential order, starting with index `0`.
312//!
313//! ## Coverage Guarantees
314//!
315//! Under the assumptions listed below, a single call to `trace_type<T>` is guaranteed to
316//! record formats for all the types that `T` depends on. Besides, if `T` is an enum, it
317//! will record all the variants of `T`.
318//!
319//! (0) Container names must not collide. If this happens, consider using `#[serde(rename = "name")]`,
320//! or implementing serde traits manually.
321//!
322//! (1) The first variants of mutually recursive enums must be a "base case". That is,
323//! defaulting to the first variant for every enum type (along with `None` for option values
324//! and `[]` for sequences) must guarantee termination of depth-first traversals of the graph of type
325//! declarations.
326//!
327//! (2) If a type runs custom validation checks during deserialization, sample values must have been provided
328//! previously by calling `trace_value`. Besides, the corresponding registered formats
329//! must not contain unknown parts.
330//!
331//! ## Design Considerations
332//!
333//! Whenever we traverse the graph of type declarations using deserialization callbacks, the type
334//! system requires us to return valid Rust values of type `V::Value`, where `V` is the type of
335//! a given `visitor`. This contraint limits the way we can stop graph traversal to only a few cases.
336//!
337//! The first 4 cases are what we have called *possible recursion points* above:
338//!
339//! * while visiting an `Option<T>` for the second time, we choose to return the value `None` to stop;
340//! * while visiting an `Seq<T>` for the second time, we choose to return the empty sequence `[]`;
341//! * while visiting an `Map<K, V>` for the second time, we choose to return the empty map `{}`;
342//! * while visiting an `enum T` for the second time, we choose to return the first variant, i.e.
343//! a "base case" by assumption (1) above.
344//!
345//! In addition to the cases above,
346//!
347//! * while visiting a container, if the container's name is mapped to a recorded value,
348//! we MAY decide to use it.
349//!
350//! The default configuration `TracerConfig:default()` always picks the recorded value for a
351//! `NewTypeStruct` and never does in the other cases.
352//!
353//! For efficiency reasons, the current algorithm does not attempt to scan the variants of enums
354//! other than the parameter `T` of the main call `trace_type<T>`. As a consequence, each enum type must be
355//! traced separately.
356
357mod de;
358mod error;
359mod format;
360mod ser;
361mod trace;
362mod value;
363
364pub use error::{Error, Result};
365pub use format::{ContainerFormat, Format, FormatHolder, Named, Variable, VariantFormat};
366pub use trace::{Registry, Samples, Tracer, TracerConfig};
367pub use value::Value;