serde-saphyr
serde-saphyr is a strongly typed YAML deserializer built on
saphyr-parser. It aims to be panic-free on malformed input and to avoid unsafe code in library code. The crate deserializes YAML directly into your Rust types without constructing an intermediate tree of “abstract values.”
Why this approach?
- Light on resources: Having almost no intermediate data structures should result in more efficient parsing, especially if anchors are used only lightly.
- Also simpler: No code to support intermediate Values of all kinds.
- Type-driven parsing: YAML that doesn’t match the expected Rust types is rejected early.
- Safer by construction: No dynamic “any” objects; common YAML-based code-execution exploits do not apply.
Project relationship
serde-saphyr is not a fork of the older serde-yaml crate and shares no code with it (apart from some reused tests). It is also not part of the saphyr project. The crate simply builds a Serde-based YAML deserialization layer around Saphyr’s public parser and is maintained independently. The name was historically chosen to reflect the use of Saphyr’s parser at a time when the Saphyr project did not provide its own Serde integration.
Benchmarking
In our benchmarking project, we tested the following crates:
| Crate | Version | Merge Keys | Nested Enums | Duplicate key rejection | garde |
Notes |
|---|---|---|---|---|---|---|
| serde-saphyr | 0.0.4 | ✅ Native | ✅ | ✅ Configurable | ✅ | Nounsafe, no unsafe-libyaml |
| serde-yaml-bw | 2.4.1 | ✅ Native | ✅ | ✅ Configurable | ❌ | Slow due Saphyr doing budget check first upfront of libyaml |
| serde-yaml-ng | 0.10.0 | ⚠️ partial | ❌ | ❌ | ❌ | |
| serde-yaml | 0.9.34 + deprecated | ⚠️ partial | ❌ | ❌ | ❌ | Original, deprecated, repo archived |
| serde-norway | 0.9 | ⚠️ partial | ❌ | ❌ | ❌ | |
| serde-yml | 0.0.12 | ⚠️ partial | ❌ | ❌ | ❌ | Repo archived |
Benchmarking was done with Criterion, giving the following results:
As seen, serde-saphyr exceeds others by performance, even with budget check enabled.
Testing
The test suite currently includes 783 passing tests, most of them originating from the fully converted yaml-test-suite, with additional cases taken from the original serde-yaml tests. The remaining 6 failing corner cases (marked as ignored) have been reviewed, and their causes are well understood. To the best of our assessment, these failures stem from the saphyr parser. They represent extremely rare edge cases that are unlikely to appear in real-world use.
Notable features
- Configurable budgets: Enforce input limits to mitigate resource exhaustion (e.g., deeply nested structures or very large arrays); see
Budget. - Serializer supports emitting anchors (Rc, Arc, Weak) if they properly wrapped (see below).
- First class
gardeintegration: Declarative validation of parsed YAML documents, reporting location with snippet directly from YAML document. - serde_json::Value is supported when parsing without target structure defined.
- robotic extensions to support YAML dialect common in robotics (see below).
Usage
Parse YAML into a Rust structure with proper error handling. The crate name on crates.io is
serde-saphyr, and the import path is serde_saphyr.
use Deserialize;
Garde integration
This crate optionally integrates with garde to run declarative validation. serde-saphyr error will print the snippet, providing location information. If the invalid value comes from the YAML anchor, serde-saphyr will also tell where this anchor has been defined.
use Validate;
use Deserialize;
// Rust in snake_case, YAML in camelCase.
A typical output looks like:
error: line 3 column 23: invalid here, validation error: length is lower than 2 for `secondString`
--> the value is used here:3:23
|
1 |
2 | firstString: &A "x"
3 | secondString: *A
| ^ invalid here, validation error: length is lower than 2 for `secondString`
4 |
|
| This value comes indirectly from the anchor at line 2 column 25:
|
1 |
2 | firstString: &A "x"
| ^ defined here
3 | secondString: *A
4 |
Common Serde renames made to follow naming conventions (case changes, snake_case, kebab-case, r# stripping) are supported, as long as they do not introduce ambiguity. Arbitrary renames are not. Parsing and validation will still work, but error messages for arbitrarily renamed fields only tell Rust path. The integration of garde is gated by the Cargo feature garde (disabled by default, use serde_saphyr = { version = "0.0.12", features = ["garde"] } in Cargo.toml to enable it).
If you prefer to validate without garde and want to ensure that location information is always available, use the heavier approach with Spanned<T> wrapper instead.
Duplicate keys
Duplicate key handling is configurable. By default it’s an error; “first wins” and “last wins” strategies are available via Options. Duplicate key policy applies not just to strings but also to other types (when deserializing into map).
Multiple documents
YAML streams can contain several documents separated by ---/... markers. When deserializing with serde_saphyr::from_multiple, you still need to supply the vector element type up front (Vec`). That does not lock you into a single shape: make the element an enum and each document will deserialize into the matching variant. This lets you mix different payloads in one stream while retaining strong typing on the Rust side.
use Deserialize;
Nested enums
Externally tagged enums nest naturally in YAML as maps keyed by the variant name. This enables strict, expressive models (enums with associated data) instead of generic maps.
use Deserialize;
There are two variants of the deserialization functions: from_* and from_*_with_options. The latter accepts an Options object that allows you to configure budget and other aspects of parsing. For larger projects that require consistent parsing behavior, we recommend defining a wrapper function so that all option and budget settings are managed in one place (see examples/wrapper_function.rs).
Tagged enums written as !!EnumName VARIANT are also supported, but only for single-level scalar variants. YAML itself cannot nest such tagged enums, so use mapping-based representations (EnumName: RED) if you need to embed enums within other enums.
Composite keys
YAML supports complex (non-string) mapping keys. Rust maps can mirror this, allowing you to parse such structures directly.
use ;
use HashMap;
Booleans
By default, if the target field is boolean, serde-saphyr will attempt to interpret standard YAML 1.1 values as boolean (not just 'false' but also 'no', etc).
If you do not want this (or you are parsing into a JSON Value where it is wrongly inferred), enclose the value in quotes or set strict_booleans to true in Options.
Deserializing into abstract JSON Value
If you must work with abstract types, you can also deserialize YAML into serde_json::Value. Serde will drive the process through deserialize_any because Value does not fix a Rust primitive type ahead of time. You lose strict type control by Rust struct data types. Also, unlike YAML, JSON does not allow composite keys, keys must be strings. Field order will be preserved.
Binary scalars
!!binary-tagged YAML values are base64-decoded when deserializing into Vec<u8> or String (reporting an error if it is not valid UTF-8)
use Deserialize;
Merge keys
serde-saphyr supports merge keys, which reduce redundancy and verbosity by specifying shared key-value pairs once and then reusing them across multiple mappings. Here is an example with merge keys (inherited properties):
use Deserialize;
/// Configuration to parse into. Does not include "defaults"
Merge keys are standard in YAML 1.1. Although YAML 1.2 no longer includes merge keys in its specification, it doesn't explicitly disallow them either, and many parsers implement this feature.
Rust types as schema
To address the “Norway problem,” the target Rust types serve as an explicit schema. Because the parser knows whether a field expects a string or a boolean, it can correctly accept 1.2 either as a number or as the string "1.2", and interpret the common YAML boolean shorthands (y, on, n, off) as actual booleans when appropriate (can be disabled). Likewise, 0x2A is parsed as a hexadecimal integer when the target field is numeric, and as a string when the target is String. As with StrictYAML, serde-saphyr avoids inferring types from values — one of the most heavily criticized aspects of YAML. The Rust type system already provides all the necessary schema information.
Schema based parsing can be disabled by setting no_schema to true in Options. In this case all unquoted values that are parsed into strings, but can be understood as something else, are rejected. This can be used for enforcing compatibility with another YAML parser that reads the same content and requires this quoting. Default setting if false.
Legacy octal notation such as 0052 can be enabled via Options, but it is disabled by default.
Pathological inputs & budgets
Fuzzing shows that certain adversarial inputs can make YAML parsers consume excessive time or memory, enabling denial-of-service scenarios. To counter this, serde-saphyr offers a fast, configurable pre-check via a Budget, available through Options. Defaults are conservative; tighten them when you know your input shape, or disable the budget if you only parse YAML you generate yourself.
During reader-based deserialization, serde-saphyr does not buffer the entire payload; it parses incrementally, counting bytes and enforcing configured budgets. This design blocks denial-of-service attempts via excessively large inputs. When streaming from the reader through the iterator, other budget limits apply on a per-document basis, since such a reader may be expected to stream indefinitely. The total size of input is not limited in this case.
To find the typical budget requirements for you file, run the main() executable of this library, providing a YAML file path as the program parameter. You can also fetch the budget programmatically by registering a handle in Options.
Serialization
use Serialize;
let yaml = to_string.unwrap;
assert!;
Anchors (Rc/Arc/Weak)
Serde-saphyr can conceptually connect YAML anchors with Rust shared references (Rc, Weak and Arc). You need to use wrappers to activate this feature:
RcAnchor<T>andArcAnchor<T>emit anchors like&a1on first occurrence and may emit aliases*a1later.RcWeakAnchor<T>andArcWeakAnchor<T>serialize a weak ref: if the strong pointer is gone, it becomesnull.
let the_a = from;
let data = Bigger ;
let serialized = to_string?;
assert_eq!;
let deserialized: Bigger = from_str?;
assert_eq!;
assert_eq!;
assert!;
Ok
}
When anchors are highly repetitive and also large, packing them into references can make YAML more human-readable.
To support round trip, library can also deserialize into these anchor structures, this serialization is identity-preserving. A field or structure that is defined once and subsequently referenced will exist as a single instance in memory, with all anchor fields pointing to it. This is crucial when the topology of references itself constitutes important information to be transferred.
Controlling text deserialization
- Empty maps are serialized as {} and empty lists as [] by default.
- Strings containing new lines, and very long strings are serialized as appropriate block scalars, except cases where they would need escaping (like ending with :).
- Indentation is changeable.
- The wrapper Commented allows to emit comment next to scalar or reference (handy when reference is far from definition and needs to be explained).
These readability improvements can be adjusted or disabled in SerializerOptions.
Robotics
The feature-gated "robotics" capability enables parsing of YAML extensions commonly used in robotics (ROS These extensions support conversion functions (deg, rad) and simple mathematical expressions such as deg(180), rad(pi), 1 + 2*(3 - 4/5), or rad(pi/2). This capability is gated behind the [robotics] feature and is not enabled by default. Additionally, angle_conversions must be set to true in the Options. Just adding robotics feature is not sufficient to activate this mode of parsing. This parser is still just a simple expression calculator implemented directly in Rust, not a some hook to a language interpreter.
rad_tag: 0.15 # value in radians, stays in radians
deg_tag: 180 # value in degrees, converts to radians
expr_complex: 1 + 2*(3 - 4/5) # simple expressions supported
func_deg: deg(180) # value in degrees, converts to radians
func_rad: rad(pi) # value in radians (stays in radians)
hh_mm_secs: -0:30:30.5 # Time
longitude: 8:32:53.2 # Nautical, ETH Zürich Main Building (8°32′53.2″ E)
let options = Options ;
let v: RoboFloats = from_str_with_options.expect;
Safety hardening with this feature enabled include (maximal expression depth, maximal number of digits, strict underscore placement and fraction parsing limits to precision-relevant digit).
Unsupported features
- Tabs in indentation YAML disallows tabs for indentation, including indentation of block scalar by tab.
- Invalid indentation of the closing bracket. The code like
key:
is not a valid YAML (the closing bracket is not indented enough). Some parsers allow this, saphyr-parser does not hence serde-saphyr does not either.
For those who want to retain compatibility with serde-yaml, even where it might deviate from the standard, serde-yaml-bw can be better choice. This crate uses saphyr-parser for budget pre-check only when unsafe-libyaml later does the final parsing.