serde-saphyr
serde-saphyr is a strongly typed YAML deserializer built on the top of slightly modified
saphyr-parser, published as saphyr-parser-bw. It aims to be panic-free on malformed input exclude unsafe code in library code. The crate deserializes YAML directly into your Rust types without constructing an intermediate tree of “abstract values.” Try it online as WebAssembly application here
See release history on GitHub.
Why this approach?
- Light on resources: Having almost no intermediate data structures should result in more efficient parsing, especially if anchors are used only lightly.
- Also simpler: No code to support intermediate Values of all kinds.
- Type-driven parsing: YAML that doesn’t match the expected Rust types is rejected early.
- Safer by construction: No dynamic “any” objects; common YAML-based code-execution exploits do not apply.
Project relationship
serde-saphyr is not a fork of the older serde-yaml crate and shares no code with it (apart from some reused tests). It is also not part of the saphyr project. The crate simply builds a Serde-based YAML deserialization layer around Saphyr’s public parser and is maintained independently. The name was historically chosen to reflect the use of Saphyr’s parser at a time when the Saphyr project did not provide its own Serde integration.
Benchmarking
In our benchmarking project, we tested the following crates:
| Crate | Version | Merge Keys | Nested Enums | Duplicate key rejection | Validation | Error snippet | Borrowed deserialization | Notes |
|---|---|---|---|---|---|---|---|---|
| serde-saphyr | 0.0.17 | ✅ | ✅ | ✅ Configurable | ✅garde / validator |
✅ | ✅ | Nounsafe, no unsafe-libyaml |
| serde-yaml-bw | 2.4.1 | ✅ | ✅ | ✅ Configurable | ❌ | ❌ | ❌ | Slow due Saphyr doing budget check first upfront of libyaml |
| serde-yaml-ng | 0.10.0 | ⚠️ | ❌ | ❌ | ❌ | ❌ | ✅ | |
| serde-yaml | 0.9.34 + deprecated | ⚠️ | ❌ | ❌ | ❌ | ❌ | ✅ | Original, deprecated, repo archived |
| serde-norway | 0.9.42 | ⚠️ | ❌ | ❌ | ❌ | ❌ | ✅ | |
| serde-yml | 0.0.12 | ⚠️ | ❌ | ❌ | ❌ | ❌ | ✅ | Repo archived |
| yaml-spanned | 0.0.3 | ⚠️ | ❌ | ✅ | ❌ | ❌ | ❌ | Uses libyaml-safer |
⚠️ - partial support. Serde-yaml forks do not support merge keys natively but instead provide apply_merge function that must be called manually. Crates marked ✅ offer native and transparent support.
Benchmarking was done with Criterion, giving the following results (lower is better):
As seen, serde-saphyr exceeds others by performance, even with budget check enabled.
Testing
The test suite currently includes over 1000 passing tests, including the fully converted yaml-test-suite, with ALL tests from there passing with no exceptions. To pass the last few remaining cases, we needed to fork the saphyr-parser crate (saphyr-parser-bw). Some additional cases are taken from the original serde-yaml tests.
Notable features
- Configurable budgets: Enforce input limits to mitigate resource exhaustion (e.g., deeply nested structures or very large arrays); see
Budget. - Serializer supports emitting anchors (Rc, Arc, Weak) if they are properly wrapped (see below).
- Declarative validation with optional
validator(example) orgarde(example). - Optional
miette(example) integration for more advanced error reporting. - serde_json::Value is supported when parsing without target structure defined.
- Serializer and Deserializer are now public (due to how it's implemented, Deserializer is available in the closure only).
- Serialized floats are official YAML floats, both 1.1 and 1.2, for example
3.0e+18and not3e+18or3e18. Some parsers (such as PyYAML, go-yaml, and Psych) do not see3e18as a number. - Precise error reporting with snippet rendering.
- robotic extensions to support YAML dialect common in robotics (see below).
WebAssembly
serde-saphyr is compatible with WebAssembly. CI flow includes builds for both wasm32-unknown-unknown (browser / JS) and wasm32-wasip1 (WASI runtimes) with full test suite running and passing. We also wrote yva in dioxus to deploy serde-saphyr on the web.
Usage
Parse YAML into a Rust structure with proper error handling. The crate name on crates.io is
serde-saphyr, and the import path is serde_saphyr.
use Deserialize;
Snippets
To make debugging easier, serde-saphyr renders snippets of the YAML that caused an error (similar to how many compilers report errors). These snippets include the line where the error occurred along with some surrounding context. Any terminal control sequences that might be present in the YAML are stripped out. If not desired, snippets can be removed for a specific error using without_snippet, or disabled entirely via the Options configuration.
Garde and Validator integration
This crate optionally integrates with validator or garde to run declarative validation. serde-saphyr error will print the snippet, providing location information. If the invalid value comes from the YAML anchor, serde-saphyr will also tell where this anchor has been defined.
Garde
use Validate;
use Deserialize;
// Rust in snake_case, YAML in camelCase.
Validator
use Deserialize;
use Validate;
// Rust in snake_case, YAML in camelCase.
A typical output with serde-saphyr native snippet rendering looks like:
error: line 3 column 23: invalid here, validation error: length is lower than 2 for `secondString`
--> the value is used here:3:23
|
1 |
2 | firstString: &A "x"
3 | secondString: *A
| ^ invalid here, validation error: length is lower than 2 for `secondString`
4 |
|
| This value comes indirectly from the anchor at line 2 column 25:
|
1 |
2 | firstString: &A "x"
| ^ defined here
3 | secondString: *A
4 |
The integration of garde is gated and disabled by default, use serde-saphyr = { version = "0.0.17", features = ["garde"] } (or features = ["validator"]) in Cargo.toml` to enable it).
If you prefer to validate without validation crates and want to ensure that location information is always available, use the heavier approach with Spanned<T> wrapper instead.
Duplicate keys
Duplicate key handling is configurable. By default it’s an error; “first wins” and “last wins” strategies are available via Options. Duplicate key policy applies not just to strings but also to other types (if used as keys when deserializing into map).
Multiple documents
YAML streams can contain several documents separated by ---/... markers. When deserializing with serde_saphyr::from_multiple, you still need to supply the vector element type up front (Vec`). That does not lock you into a single shape: make the element an enum and each document will deserialize into the matching variant. This lets you mix different payloads in one stream while retaining strong typing on the Rust side.
use Deserialize;
Nested enums
Externally tagged enums nest naturally in YAML as maps keyed by the variant name. This enables strict, expressive models (enums with associated data) instead of generic maps.
use Deserialize;
There are two variants of the deserialization functions: from_* and from_*_with_options. The latter accepts an Options object that allows you to configure budget and other aspects of parsing. For larger projects that require consistent parsing behavior, we recommend defining a wrapper function so that all option and budget settings are managed in one place (see examples/wrapper_function.rs).
Tagged enums written as !!EnumName VARIANT are also supported, but only for single-level scalar variants. YAML itself cannot nest such tagged enums, so use mapping-based representations (EnumName: RED) if you need to embed enums within other enums.
Composite keys
YAML supports complex (non-string) mapping keys. Rust maps can mirror this, allowing you to parse such structures directly.
use ;
use HashMap;
Options
Serde-saphyr provides control over serialization and deserialization behavior. We generally welcome feature requests, but we also recognize that not every user wants every feature enabled by default.
To support different use cases, most behavior can be enabled, disabled, or tuned via Options (deserializers) and SerializerOptions (serializers).
Adding fields to the public API is a breaking change. To allow new options without breaking compatibility, Serde-saphyr uses a macro-driven approach based on the options!, budget!, and ser_options! macros.
Booleans
By default, if the target field is boolean, serde-saphyr will attempt to interpret standard YAML 1.1 values as boolean (not just 'false' but also 'no', etc).
If you do not want this (or you are parsing into a JSON Value where it is wrongly inferred), enclose the value in quotes or set strict_booleans to true in Options.
Deserializing into abstract JSON Value
If you must work with abstract types, you can also deserialize YAML into serde_json::Value. Serde will drive the process through deserialize_any because Value does not fix a Rust primitive type ahead of time. You lose strict type control by Rust struct data types. Also, unlike YAML, JSON does not allow composite keys, keys must be strings. Field order will be preserved.
Binary scalars
!!binary-tagged YAML values are base64-decoded when deserializing into Vec<u8> or String (reporting an error if it is not valid UTF-8)
use Deserialize;
Important: some projects add the !!binary tag while actually expecting a verbatim string value (for example, the literal string "aGVsbG8="). This works with parsers that simply ignore the tag. However, serde-saphyr decodes !!binary values by default, attempting to interpret them as UTF-8 bytes.
If you use !!binary only as a documentation or annotation tag, enable ignore_binary_tag_for_string = true in Options.
use Deserialize;
!!binary for other types like Vec<u8> will stay supported.
Merge keys
serde-saphyr supports merge keys, which reduce redundancy and verbosity by specifying shared key-value pairs once and then reusing them across multiple mappings. Here is an example with merge keys (inherited properties):
use Deserialize;
/// Configuration to parse into. Does not include "defaults"
Merge keys are standard in YAML 1.1. Although YAML 1.2 no longer includes merge keys in its specification, it doesn't explicitly disallow them either, and many parsers implement this feature.
Rust types as schema
To address the “Norway problem,” the target Rust types serve as an explicit schema. Because the parser knows whether a field expects a string or a boolean, it can correctly accept 1.2 either as a number or as the string "1.2", and interpret the common YAML boolean shorthands (y, on, n, off) as actual booleans when appropriate (can be disabled). Likewise, 0x2A is parsed as a hexadecimal integer when the target field is numeric, and as a string when the target is String. As with StrictYAML, serde-saphyr avoids inferring types from values — one of the most heavily criticized aspects of YAML. The Rust type system already provides all the necessary schema information.
Schema-based parsing can be disabled by setting no_schema to true in Options. In this case all unquoted values that are parsed into strings, but can be understood as something else, are rejected. This can be used for enforcing compatibility with another YAML parser that reads the same content and requires this quoting. Default setting is false.
Legacy octal notation such as 0052 can be enabled via Options, but it is disabled by default.
The concept that “Rust code is the schema” naturally extends to implemented support for validator and garde, as these crates allow annotations to be added directly to Rust types, providing even stricter control over permissible values
Pathological inputs & budgets
Fuzzing shows that certain adversarial inputs can make YAML parsers consume excessive time or memory, enabling denial-of-service scenarios. To counter this, serde-saphyr offers a fast, configurable pre-check via a Budget, available through Options. Defaults are conservative; tighten them when you know your input shape, or disable the budget if you only parse YAML you generate yourself.
During reader-based deserialization, serde-saphyr does not buffer the entire payload; it parses incrementally, counting bytes and enforcing configured budgets. This design blocks denial-of-service attempts via excessively large inputs. When streaming from the reader through the iterator, other budget limits apply on a per-document basis, since such a reader may be expected to stream indefinitely. The total size of input is not limited in this case.
To find the typical budget requirements for you file, use our web demo or [run the main() executable of this library, providing a YAML file path as the program parameter. You can also fetch the budget programmatically by registering a closure with Options::with_budget_report.
Serialization
use Serialize;
let yaml = to_string.unwrap;
assert!;
Anchors (Rc/Arc/Weak)
Serde-saphyr can conceptually connect YAML anchors with Rust shared references (Rc, Weak and Arc). You need to use wrappers to activate this feature:
- RcAnchor and ArcAnchor emit anchors like
&a1on first occurrence and may emit aliases*a1later. - RcWeakAnchor and ArcWeakAnchor serialize a weak ref: if the strong pointer is gone, it becomes
null.
let the_a = from;
let data = Bigger ;
let serialized = to_string?;
assert_eq!;
let deserialized: Bigger = from_str?;
assert_eq!;
assert_eq!;
assert!;
Ok
}
When anchors are highly repetitive and also large, packing them into references can make YAML more human-readable.
To support round trip, library can also deserialize into these anchor structures, this serialization is identity-preserving. A field or structure that is defined once and subsequently referenced will exist as a single instance in memory, with all anchor fields pointing to it. This is crucial when the topology of references itself constitutes important information to be transferred.
Recursive YAML
While recursive YAML is unusual, it is not forbidden by the specification. Real world examples and requests to implement exist.
Serde-saphyr supports recursive structures but Rust requires to be about this very explicit. A structure that may hold recursive references to itself must be wrapped in a RcRecursive, and any reference that points to it must be RcRecursion. Arc varieties exist. See also examples/recursive_yaml.rs.
Controlling deserialization
- Empty maps are serialized as {} and empty lists as [] by default.
- Strings containing new lines, and very long strings are serialized as appropriate block scalars, except cases where they would need escaping (like ending with :).
- Indentation is changeable.
- The wrapper Commented allows to emit comment next to scalar or reference (handy when reference is far from definition and needs to be explained).
- The wrapper SpaceAfter adds an empty line after the wrapped value, useful for visually separating sections in the output YAML.
- It is possible to request that all strings be quoted — using single quotes when no escape sequences are present, and double quotes otherwise. This is very explicit and unambiguous, but such YAML may be less readable for humans. Line wrapping is disabled in this mode.
- YAML 1.1 booleans (
y,yes,on, etc.) are normally quoted as both keys and values. If this is undesired (y is a coordinate), setyaml_12to true.
These settings are changeable in SerializerOptions.
Borrowed string deserialization
serde-saphyr supports zero-copy deserialization for string fields when using from_str or from_slice. This allows deserializing into &str fields that borrow directly from the input, avoiding allocation overhead.
use Deserialize;
let yaml = "name: hello\nvalue: 42\n";
let data: Data = from_str.unwrap;
assert_eq!;
Limitations:
- Borrowing works for any scalar whose parsed value exists verbatim in the input. This includes plain scalars and simple quoted strings without escape sequences (e.g.,
"hello world"can be borrowed, but"hello\nworld"cannot because\nis transformed to a newline). - If a scalar requires transformation (escape processing, line folding, block scalar normalization, or
''escape in single-quoted strings), deserialization into&strfails with a helpful error suggestingStringorCow<str>. - Reader-based entry points (
from_reader) requireDeserializeOwnedand cannot return borrowed values.
For maximum flexibility, use Cow<'a, str> which borrows when possible and owns when transformation is required.
Custom messages
The default error messages are developer-oriented. They may mention serde-saphyr APIs and
options and include “action items” intended to help fix the problem.
If error messages are shown to end users, switch to the built-in user-facing formatter, or provide your own formatter (for example, to translate messages into another language).
See:
MessageFormatter— controls the main message text for eachError.Localizer— controls message pieces that are composed outsideMessageFormatter::format_message(location suffixes, validation/snippet labels, etc.).
Use the built-in user-facing formatter
use UserMessageFormatter;
# let err = .unwrap_err;
println!;
Use a custom formatter with miette
If you want fancy diagnostics via miette, you can convert a serde-saphyr error to a
miette::Report while still controlling the message text via a custom formatter:
use ;
This requires enabling the crate’s miette feature.
For a complete custom formatter/localizer example, see examples/pirate_formatter.rs. For an
end-to-end miette example, see examples/miette.rs.
Robotics
The feature-gated "robotics" capability enables parsing of YAML extensions commonly used in robotics (ROS These extensions support conversion functions (deg, rad) and simple mathematical expressions such as deg(180), rad(pi), 1 + 2*(3 - 4/5), or rad(pi/2). This capability is gated behind the [robotics] feature and is not enabled by default. Additionally, angle_conversions must be set to true in the Options. Just adding robotics feature is not sufficient to activate this mode of parsing. This parser is still just a simple expression calculator implemented directly in Rust, not some hook into a language interpreter.
rad_tag: 0.15 # value in radians, stays in radians
deg_tag: 180 # value in degrees, converts to radians
expr_complex: 1 + 2*(3 - 4/5) # simple expressions supported
func_deg: deg(180) # value in degrees, converts to radians
func_rad: rad(pi) # value in radians (stays in radians)
hh_mm_secs: -0:30:30.5 # Time
longitude: 8:32:53.2 # Nautical, ETH Zürich Main Building (8°32′53.2″ E)
let options = Options ;
let v: RoboFloats = from_str_with_options.expect;
Safety hardening with this feature enabled include (maximal expression depth, maximal number of digits, strict underscore placement and fraction parsing limits to precision-relevant digit).
Unsupported features
- Common Serde renames made to follow naming conventions (case changes, snake_case, kebab-case, r# stripping) are supported in snippets, as long as they do not introduce ambiguity. Arbitrary renames, flattening, aliases and other complex manipulations possible with serde are not. Parsing and validation will still work, but error messages for arbitrarily renamed fields only tell Rust path.
Spanned<T>cannot be used within variants of untagged or internally tagged enums due to a fundamental limitation in Serde. Instead, wrap the entire enum in Spanned, or use externally tagged enums (the default).
Executable
serde-saphyr comes with a simple executable (CLI) that can be used to check the budget of a given YAML file and also used as YAML validator printing YAML error line, column numbers and excerpt.
To run it (no Rust knowledge required):
# binary name is the package name by default
To enable fancy error reporting (graphical diagnostics) via the optional miette integration, install/build the CLI with the miette feature enabled:
# install with miette enabled
# or run from a git checkout
If you want to keep the previous plain-text error output even when built with miette, pass --plain: