serde-saphyr
serde-saphyr is a strongly typed YAML deserializer built on
saphyr-parser. It aims to be panic-free on malformed input and to avoid unsafe code in library code. The crate deserializes YAML directly into your Rust types without constructing an intermediate tree of “abstract values.” It is not a fork of the older serde-yaml and does not share any code with it (some tests are reused). It provides both serializer and deserializer.
Why this approach?
- Light on resources: Having almost no intermediate data structures should result in more efficient parsing, especially if anchors are used only lightly.
- Also simpler: No code to support intermediate Values of all kinds.
- Type-driven parsing: YAML that doesn’t match the expected Rust types is rejected early.
- Safer by construction: No dynamic “any” objects; common YAML-based code-execution exploits do not apply.
Benchmarking
In our benchmarking project, we tested the following crates:
| Crate | Version | Merge Keys | Nested Enums | Duplicate key rejection | Notes |
|---|---|---|---|---|---|
| serde-saphyr | 0.0.4 | ✅ Native | ✅ | ✅ Configurable | No unsafe, no unsafe-libyaml |
| serde-yaml-bw | 2.4.1 | ✅ Native | ✅ | ✅ Configurable | Slow due Saphyr doing budget check first upfront of libyaml |
| serde-yaml-ng | 0.10.0 | ⚠️ partial | ❌ | ❌ | |
| serde-yaml | 0.9.34 + deprecated | ⚠️ partial | ❌ | ❌ | Original, deprecated, repo archived |
| serde-norway | 0.9 | ⚠️ partial | ❌ | ❌ | |
| serde-yml | 0.0.12 | ⚠️ partial | ❌ | ❌ | Repo archived |
Benchmarking was done with Criterion, giving the following results:
As seen, serde-saphyr exceeds others by performance, even with budget check enabled.
Testing
The test suite currently includes 656 passing tests, most of them originating from the fully converted yaml-test-suite, with additional cases taken from the original serde-yaml tests. The remaining 18 failing corner cases (marked as ignored) have been reviewed, and their causes are well understood. To the best of our assessment, these failures stem from the saphyr parser. They represent extremely rare edge cases that are unlikely to appear in real-world use.
Other features
- Configurable budgets: Enforce input limits to mitigate resource exhaustion (e.g., deeply nested structures or very large arrays); see
Budget. - Serializer supports emitting anchors (Rc, Arc, Weak) if they properly wrapped (see below).
- serde_json::Value is supported when parsing without target structure defined.
- robotic extensions to support YAML dialect common in robotics (see below).
Deserialization
Duplicate keys
Duplicate key handling is configurable. By default it’s an error; “first wins” and “last wins” strategies are available via Options. Duplicate key policy applies not just to strings but also to other types (when deserializing into map).
Unsupported features
- Tagged enums (
!!EnumName RED) are not supported. Use mapping-based enums (EnumName: RED) instead. This also allows you to define nested enums if needed, with tagged enums this is not possible by YAML standard. - Tabs in block scalar While standard YAML only disallows tabs for indentation, saphyr-parser rejects them also in the body of unquoted scalar. This looks even reasonable (to avoid invisible characters) and looks like implemented deliberately.
- Invalid indentation of the closing bracket. The code like
key:
is not a valid YAML, closing bracket must be moved more to the right. Some parsers allow this deviation from the rules, serde-saphyr does not.
For those who want to retain very strict compatibility with serde-yaml, serde-yaml-bw can be better choice. This crate uses saphyr-parser for budget pre-check only when unsafe-libyaml later does the final parsing.
Usage
Parse YAML into a Rust structure with proper error handling. The crate name on crates.io is
serde-saphyr, and the import path is serde_saphyr.
use Deserialize;
Multiple documents
YAML streams can contain several documents separated by ---/... markers. When deserializing with serde_saphyr::from_multiple, you still need to supply the vector element type up front (Vec`). That does not lock you into a single shape: make the element an enum and each document will deserialize into the matching variant. This lets you mix different payloads in one stream while retaining strong typing on the Rust side.
use Deserialize;
Nested enums
Externally tagged enums nest naturally in YAML as maps keyed by the variant name. This enables strict, expressive models (enums with associated data) instead of generic maps.
use Deserialize;
There are two variants of the deserialization functions: from_* and from_*_with_options. The latter accepts an Options object that allows you to configure budget and other aspects of parsing. For larger projects that require consistent parsing behavior, we recommend defining a wrapper function so that all option and budget settings are managed in one place (see examples/wrapper_function.rs).
Composite keys
YAML supports complex (non-string) mapping keys. Rust maps can mirror this, allowing you to parse such structures directly.
use ;
use HashMap;
Booleans
By default, if the target field is boolean, serde-saphyr will attempt to interpret standard YAML 1.1 values as boolean (not just 'false' but also 'no', etc).
If you do not want this (or you are parsing into a JSON Value where it is wrongly inferred), enclose the value in quotes or set strict_booleans to true in Options.
Deserializing into abstract JSON Value
If you must work with abstract types, you can also deserialize YAML into serde_json::Value. Serde will drive the process through deserialize_any because Value does not fix a Rust primitive type ahead of time. You lose strict type control by Rust struct data types.
Binary scalars
!!binary-tagged YAML values are base64-decoded when deserializing into Vec<u8> or String (reporting an error if it is not valid UTF-8)
use Deserialize;
Merge keys
serde-saphyr supports merge keys, which reduce redundancy and verbosity by specifying shared key-value pairs once and then reusing them across multiple mappings. Here is an example with merge keys (inherited properties):
use Deserialize;
/// Configuration to parse into. Does not include "defaults"
Merge keys are standard in YAML 1.1. Although YAML 1.2 no longer includes merge keys in its specification, it doesn't explicitly disallow them either, and many parsers implement this feature.
Rust types as schema
To address the “Norway problem,” the target Rust types serve as an explicit schema. Because the parser knows whether a field expects a string or a boolean, it can correctly accept 1.2 either as a number or as the string "1.2", and interpret the common YAML boolean shorthands (y, on, n, off) as actual booleans when appropriate (can be disabled). Likewise, 0x2A is parsed as a hexadecimal integer when the target field is numeric, and as a string when the target is String. As with StrictYAML, serde-saphyr avoids inferring types from values — one of the most heavily criticized aspects of YAML. The Rust type system already provides all the necessary schema information.
Schema based parsing can be disabled by setting no_schema to true in Options. In this case all unquoted values that are parsed into strings, but can be understood as something else, are rejected. This can be used for enforcing compatibility with another YAML parser that reads the same content and requires this quoting. Default setting if false.
Legacy octal notation such as 0052 can be enabled via Options, but it is disabled by default.
Pathological inputs & budgets
Fuzzing shows that certain adversarial inputs can make YAML parsers consume excessive time or memory, enabling denial-of-service scenarios. To counter this, serde-saphyr offers a fast, configurable pre-check via a Budget, available through Options. Defaults are conservative; tighten them when you know your input shape, or disable the budget if you only parse YAML you generate yourself.
During reader-based deserialization, serde-saphyr does not buffer the entire payload; it parses incrementally, counting bytes and enforcing configured budgets. This design blocks denial-of-service attempts via excessively large inputs. When streaming from the reader through the iterator, other budget limits apply on a per-document basis, since such a reader may be expected to stream indefinitely. The total size of input is not limited in this case.
Serialization
use Serialize;
let yaml = to_string.unwrap;
assert!;
Anchors (Rc/Arc/Weak)
Serde-saphyr can conceptually connect YAML anchors with Rust shared references (Rc, Weak and Arc). You need to use wrappers to activate this feature:
RcAnchor<T>andArcAnchor<T>emit anchors like&a1on first occurrence and may emit aliases*a1later.RcWeakAnchor<T>andArcWeakAnchor<T>serialize a weak ref: if the strong pointer is gone, it becomesnull.
let the_a = from;
let data = Bigger ;
let serialized = to_string?;
assert_eq!;
let deserialized: Bigger = from_str?;
assert_eq!;
assert_eq!;
assert!;
Ok
}
When anchors are highly repetitive and also large, packing them into references can make YAML more human-readable.
Starting from 0.0.7, this library can also deserialize YAML into these anchor structures, this serialization is identity-preserving. A field or structure that is defined once and subsequently referenced will exist as a single instance in memory, with all anchor fields pointing to it. This is crucial when the topology of references itself constitutes important information to be transferred.
Robotics
The feature-gated "robotics" capability enables parsing of YAML extensions commonly used in robotics (ROS These extensions support conversion functions (deg, rad) and simple mathematical expressions such as deg(180), rad(pi), 1 + 2*(3 - 4/5), or rad(pi/2). This capability is gated behind the [robotics] feature and is not enabled by default. Additionally, angle_conversions must be set to true in the Options. Just adding robotics feature is not sufficient to activate this mode of parsing. This parser is still just a simple expression calculator implemented directly in Rust, not a some hook to a language interpreter.
rad_tag: 0.15 # value in radians, stays in radians
deg_tag: 180 # value in degrees, converts to radians
expr_complex: 1 + 2*(3 - 4/5) # simple expressions supported
func_deg: deg(180) # value in degrees, converts to radians
func_rad: rad(pi) # value in radians (stays in radians)
hh_mm_secs: -0:30:30.5 # Time
longitude: 8:32:53.2 # Nautical, ETH Zürich Main Building (8°32′53.2″ E)
let options = Options ;
let v: RoboFloats = from_str_with_options.expect;
Safety hardening with this feature enabled include (maximal expression depth, maximal number of digits, strict underscore placement and fraction parsing limits to precision-relevant digit).