serde_reflection/
lib.rs

1// Copyright (c) Facebook, Inc. and its affiliates
2// SPDX-License-Identifier: MIT OR Apache-2.0
3
4#![forbid(unsafe_code)]
5
6//! This crate provides a way to extract format descriptions for Rust containers that
7//! implement the Serialize and/or Deserialize trait(s) of Serde.
8//!
9//! Format descriptions are useful in several ways:
10//! * Stored under version control, formats can be tested to prevent unintended modifications
11//!   of binary serialization formats (e.g. by changing variant order).
12//!
13//! * Formats can be passed to [`serde-generate`](https://docs.rs/serde-generate)
14//!   in order to generate class definitions and provide Serde-compatible binary
15//!   serialization in other languages (C++, python, Java, etc).
16//!
17//! * Together with the module `json_converter`, formats allow dynamic translation of
18//!   binary-serialized values to JSON and from JSON.
19//!
20//! # Quick Start
21//!
22//! Very often, Serde traits are simply implemented using Serde derive macros. In this case,
23//! you may obtain format descriptions as follows:
24//! * call `trace_simple_type` on the desired top-level container definition(s), then
25//! * add a call to `trace_simple_type` for each `enum` type. (This will fix any `MissingVariants` error.)
26//!
27//! ```rust
28//! # use serde::Deserialize;
29//! # use serde_reflection::{Error, Samples, Tracer, TracerConfig};
30//! #[derive(Deserialize)]
31//! struct Foo {
32//!   bar: Bar,
33//!   choice: Choice,
34//! }
35//!
36//! #[derive(Deserialize)]
37//! struct Bar(u64);
38//!
39//! #[derive(Deserialize)]
40//! enum Choice { A, B, C }
41//!
42//! # fn main() -> Result<(), Error> {
43//! // Start the tracing session.
44//! let mut tracer = Tracer::new(TracerConfig::default());
45//!
46//! // Trace the desired top-level type(s).
47//! tracer.trace_simple_type::<Foo>()?;
48//!
49//! // Also trace each enum type separately to fix any `MissingVariants` error.
50//! tracer.trace_simple_type::<Choice>()?;
51//!
52//! // Obtain the registry of Serde formats and serialize it in YAML (for instance).
53//! let registry = tracer.registry()?;
54//! let data = serde_yaml::to_string(&registry).unwrap();
55//! assert_eq!(&data, r#"---
56//! Bar:
57//!   NEWTYPESTRUCT: U64
58//! Choice:
59//!   ENUM:
60//!     0:
61//!       A: UNIT
62//!     1:
63//!       B: UNIT
64//!     2:
65//!       C: UNIT
66//! Foo:
67//!   STRUCT:
68//!     - bar:
69//!         TYPENAME: Bar
70//!     - choice:
71//!         TYPENAME: Choice
72//! "#);
73//! # Ok(())
74//! # }
75//! ```
76//!
77//! # Features and Limitations
78//!
79//! `serde_reflection` is meant to extract formats for Rust containers (i.e. structs and
80//! enums) with "reasonable" implementations of the Serde traits `Serialize` and
81//! `Deserialize`.
82//!
83//! ## Supported features
84//!
85//! * Plain derived implementations obtained with `#[derive(Serialize, Deserialize)]` for
86//!   Rust containers in the Serde [data model](https://serde.rs/data-model.html)
87//!
88//! * Customized derived implementations using Serde attributes that are compatible with
89//!   binary serialization formats, such as `#[serde(rename = "Name")]`.
90//!
91//! * Hand-written implementations of `Deserialize` that are more restrictive than the
92//!   derived ones, provided that `trace_value` is used during tracing to provide sample
93//!   values for all such constrained types (see the detailed example below).
94//!
95//! * Mutually recursive types provided that the first variant of each enum is
96//!   recursion-free. (For instance, `enum List { None, Some(Box<List>)}`.) Note that each
97//!   enum must be traced separately with `trace_type` to discover all the variants.
98//!
99//! ## Unsupported idioms
100//!
101//! * Containers sharing the same base name (e.g. `Foo`) but from different modules. (Work
102//!   around: use `#[serde(rename = ..)]`)
103//!
104//! * Generic types instantiated multiple times in the same tracing session. (Work around:
105//!   use the crate [`serde-name`](https://crates.io/crates/serde-name) and its adapters `SerializeNameAdapter` and `DeserializeNameAdapter`.)
106//!
107//! * Attributes that are not compatible with binary formats (e.g. `#[serde(flatten)]`, `#[serde(tag = ..)]`)
108//!
109//! * Tracing type aliases. (E.g. `type Pair = (u32, u64)` will not create an entry "Pair".)
110//!
111//! * Mutually recursive types for which picking the first variant of each enum does not
112//!   terminate. (Work around: re-order the variants. For instance `enum List {
113//!   Some(Box<List>), None}` must be rewritten `enum List { None, Some(Box<List>)}`.)
114//!
115//! * Certain standard types such as `std::num::NonZeroU8` may not be tracked as a
116//!   container and appear simply as their underlying primitive type (e.g. `u8`) in the
117//!   formats. This loss of information makes it difficult to use `trace_value` to work
118//!   around deserialization invariants (see example below). As a work around, you may
119//!   override the default for the primitive type using `TracerConfig` (e.g. `let config =
120//!   TracerConfig::default().default_u8_value(1);`).
121//!
122//! ## Security CAVEAT
123//!
124//! At this time, `HashSet<T>` and `BTreeSet<T>` are treated as sequences (i.e. vectors)
125//! by Serde.
126//!
127//! Cryptographic applications using [BCS](https:/github.com/diem/bcs) **must** use
128//! `HashMap<T, ()>` and `BTreeMap<T, ()>` instead. Using `HashSet<T>` or `BTreeSet<T>`
129//! will compile but BCS-deserialization will not enforce canonicity (meaning unique,
130//! well-ordered serialized elements in this case). In the case of `HashSet<T>`,
131//! serialization will additionally be non-deterministic.
132//!
133//! # Troubleshooting
134//!
135//! The error type used in this crate provides a method `error.explanation()` to help with
136//! troubleshooting during format tracing.
137//!
138//! # Detailed Example
139//!
140//! In the following, more complete example, we extract the Serde formats of two containers
141//! `Name` and `Person` and demonstrate how to handle a custom implementation of `serde::Deserialize`
142//! for `Name`.
143//!
144//! ```rust
145//! # use serde::{Deserialize, Serialize};
146//! use serde_reflection::{ContainerFormat, Error, Format, Samples, Tracer, TracerConfig};
147//!
148//! #[derive(Serialize, PartialEq, Eq, Debug, Clone)]
149//! struct Name(String);
150//! // impl<'de> Deserialize<'de> for Name { ... }
151//! # impl<'de> Deserialize<'de> for Name {
152//! #     fn deserialize<D>(deserializer: D) -> std::result::Result<Self, D::Error>
153//! #     where
154//! #         D: ::serde::Deserializer<'de>,
155//! #     {
156//! #         // Make sure to wrap our value in a container with the same name
157//! #         // as the original type.
158//! #         #[derive(Deserialize)]
159//! #         #[serde(rename = "Name")]
160//! #         struct InternalValue(String);
161//! #         let value = InternalValue::deserialize(deserializer)?.0;
162//! #         // Enforce some custom invariant
163//! #         if value.len() >= 2 && value.chars().all(char::is_alphabetic) {
164//! #             Ok(Name(value))
165//! #         } else {
166//! #             Err(<D::Error as ::serde::de::Error>::custom(format!(
167//! #                 "Invalid name {}",
168//! #                 value
169//! #             )))
170//! #         }
171//! #     }
172//! # }
173//!
174//! #[derive(Serialize, Deserialize, PartialEq, Eq, Debug, Clone)]
175//! enum Person {
176//!     NickName(Name),
177//!     FullName { first: Name, last: Name },
178//! }
179//!
180//! # fn main() -> Result<(), Error> {
181//! // Start a session to trace formats.
182//! let mut tracer = Tracer::new(TracerConfig::default());
183//! // Create a store to hold samples of Rust values.
184//! let mut samples = Samples::new();
185//!
186//! // For every type (here `Name`), if a user-defined implementation of `Deserialize` exists and
187//! // is known to perform custom validation checks, use `trace_value` first so that `samples`
188//! // contains a valid Rust value of this type.
189//! let bob = Name("Bob".into());
190//! tracer.trace_value(&mut samples, &bob)?;
191//! assert!(samples.value("Name").is_some());
192//!
193//! // Now, let's trace deserialization for the top-level type `Person`.
194//! // We pass a reference to `samples` so that sampled values are used for custom types.
195//! let (format, values) = tracer.trace_type::<Person>(&samples)?;
196//! assert_eq!(format, Format::TypeName("Person".into()));
197//!
198//! // As a byproduct, we have also obtained sample values of type `Person`.
199//! // We can see that the user-provided value `bob` was used consistently to pass
200//! // validation checks for `Name`.
201//! assert_eq!(values[0], Person::NickName(bob.clone()));
202//! assert_eq!(values[1], Person::FullName { first: bob.clone(), last: bob.clone() });
203//!
204//! // We have no more top-level types to trace, so let's stop the tracing session and obtain
205//! // a final registry of containers.
206//! let registry = tracer.registry()?;
207//!
208//! // We have successfully extracted a format description of all Serde containers under `Person`.
209//! assert_eq!(
210//!     registry.get("Name").unwrap(),
211//!     &ContainerFormat::NewTypeStruct(Box::new(Format::Str)),
212//! );
213//! match registry.get("Person").unwrap() {
214//!     ContainerFormat::Enum(variants) => assert_eq!(variants.len(), 2),
215//!      _ => panic!(),
216//! };
217//!
218//! // Export the registry in YAML.
219//! let data = serde_yaml::to_string(&registry).unwrap();
220//! assert_eq!(&data, r#"---
221//! Name:
222//!   NEWTYPESTRUCT: STR
223//! Person:
224//!   ENUM:
225//!     0:
226//!       NickName:
227//!         NEWTYPE:
228//!           TYPENAME: Name
229//!     1:
230//!       FullName:
231//!         STRUCT:
232//!           - first:
233//!               TYPENAME: Name
234//!           - last:
235//!               TYPENAME: Name
236//! "#);
237//! # Ok(())
238//! # }
239//! ```
240//!
241//! # Tracing Serialization with `trace_value`
242//!
243//! Tracing the serialization of a Rust value `v` consists of visiting the structural
244//! components of `v` in depth and recording Serde formats for all the visited types.
245//!
246//! ```rust
247//! # use serde_reflection::*;
248//! # use serde::Serialize;
249//! #[derive(Serialize)]
250//! struct FullName<'a> {
251//!   first: &'a str,
252//!   middle: Option<&'a str>,
253//!   last: &'a str,
254//! }
255//!
256//! # fn main() -> Result<(), Error> {
257//! let mut tracer = Tracer::new(TracerConfig::default());
258//! let mut samples = Samples::new();
259//! tracer.trace_value(&mut samples, &FullName { first: "", middle: Some(""), last: "" })?;
260//! let registry = tracer.registry()?;
261//! match registry.get("FullName").unwrap() {
262//!     ContainerFormat::Struct(fields) => assert_eq!(fields.len(), 3),
263//!     _ => panic!(),
264//! };
265//! # Ok(())
266//! # }
267//! ```
268//!
269//! This approach works well but it can only recover the formats of datatypes for which
270//! nontrivial samples have been provided:
271//!
272//! * In enums, only the variants explicitly covered by user samples will be recorded.
273//!
274//! * Providing a `None` value or an empty vector `[]` within a sample may result in
275//!   formats that are partially unknown.
276//!
277//! ```rust
278//! # use serde_reflection::*;
279//! # use serde::Serialize;
280//! # #[derive(Serialize)]
281//! # struct FullName<'a> {
282//! #   first: &'a str,
283//! #   middle: Option<&'a str>,
284//! #   last: &'a str,
285//! # }
286//! # fn main() -> Result<(), Error> {
287//! let mut tracer = Tracer::new(TracerConfig::default());
288//! let mut samples = Samples::new();
289//! tracer.trace_value(&mut samples, &FullName { first: "", middle: None, last: "" })?;
290//! assert_eq!(tracer.registry().unwrap_err(), Error::UnknownFormatInContainer("FullName".to_string()));
291//! # Ok(())
292//! # }
293//! ```
294//!
295//! For this reason, we introduce a complementary set of APIs to trace deserialization of types.
296//!
297//! # Tracing Deserialization with `trace_type<T>`
298//!
299//! Deserialization-tracing APIs take a type `T`, the current tracing state, and a
300//! reference to previously recorded samples as input.
301//!
302//! ## Core Algorithm and High-Level API
303//!
304//! The core algorithm `trace_type_once<T>`
305//! attempts to reconstruct a witness value of type `T` by exploring the graph of all the types
306//! occurring in the definition of `T`. At the same time, the algorithm records the
307//! formats of all the visited structs and enum variants.
308//!
309//! For the exploration to be able to terminate, the core algorithm `trace_type_once<T>` explores
310//! each possible recursion point only once (see paragraph below).
311//! In particular, if `T` is an enum, `trace_type_once<T>` discovers only one variant of `T` at a time.
312//!
313//! For this reason, the high-level API `trace_type<T>`
314//! will repeat calls to `trace_type_once<T>` until all the variants of `T` are known.
315//! Variant cases of `T` are explored in sequential order, starting with index `0`.
316//!
317//! ## Coverage Guarantees
318//!
319//! Under the assumptions listed below, a single call to `trace_type<T>` is guaranteed to
320//! record formats for all the types that `T` depends on. Besides, if `T` is an enum, it
321//! will record all the variants of `T`.
322//!
323//! (0) Container names must not collide. If this happens, consider using `#[serde(rename = "name")]`,
324//! or implementing serde traits manually.
325//!
326//! (1) The first variants of mutually recursive enums must be a "base case". That is,
327//! defaulting to the first variant for every enum type (along with `None` for option values
328//! and `[]` for sequences) must guarantee termination of depth-first traversals of the graph of type
329//! declarations.
330//!
331//! (2) If a type runs custom validation checks during deserialization, sample values must have been provided
332//! previously by calling `trace_value`. Besides, the corresponding registered formats
333//! must not contain unknown parts.
334//!
335//! ## Design Considerations
336//!
337//! Whenever we traverse the graph of type declarations using deserialization callbacks, the type
338//! system requires us to return valid Rust values of type `V::Value`, where `V` is the type of
339//! a given `visitor`. This contraint limits the way we can stop graph traversal to only a few cases.
340//!
341//! The first 4 cases are what we have called *possible recursion points* above:
342//!
343//! * while visiting an `Option<T>` for the second time, we choose to return the value `None` to stop;
344//! * while visiting an `Seq<T>` for the second time, we choose to return the empty sequence `[]`;
345//! * while visiting an `Map<K, V>` for the second time, we choose to return the empty map `{}`;
346//! * while visiting an `enum T` for the second time, we choose to return the first variant, i.e.
347//!   a "base case" by assumption (1) above.
348//!
349//! In addition to the cases above,
350//!
351//! * while visiting a container, if the container's name is mapped to a recorded value,
352//!   we MAY decide to use it.
353//!
354//! The default configuration `TracerConfig:default()` always picks the recorded value for a
355//! `NewTypeStruct` and never does in the other cases.
356//!
357//! For efficiency reasons, the current algorithm does not attempt to scan the variants of enums
358//! other than the parameter `T` of the main call `trace_type<T>`. As a consequence, each enum type must be
359//! traced separately.
360
361mod de;
362mod error;
363mod format;
364mod ser;
365mod trace;
366mod value;
367
368#[cfg(feature = "json")]
369pub mod json_converter;
370
371pub use error::{Error, Result};
372pub use format::{ContainerFormat, Format, FormatHolder, Named, Variable, VariantFormat};
373pub use trace::{Registry, Samples, Tracer, TracerConfig};
374pub use value::Value;