serde-saphyr
serde-saphyr is a strongly typed YAML deserializer built on top of a slightly modified
saphyr-parser, published as saphyr-parser-bw. It aims to be panic-free on malformed input and to exclude unsafe code from the library. The crate deserializes YAML directly into your Rust types without constructing an intermediate tree of “abstract values.” Try it online as a WebAssembly application here.
See release history on GitHub.
Why this approach?
- Light on resources: Having almost no intermediate data structures should result in more efficient parsing, especially if anchors are used only lightly.
- Also, simpler: No code to support intermediate Values of all kinds.
- Type-driven parsing: YAML that doesn’t match the expected Rust types is rejected early.
- Safer by construction: serde-saphyr avoids the typical YAML remote code execution vulnerability because it does not support or implement tag-driven object instantiation. Instead, it deserializes into fixed Rust types via Serde, removing the object-instantiation mechanism that such exploits depend on.
Project relationship
serde-saphyr is not a fork of the older serde-yaml crate and shares no code with it (apart from some reused tests). It is also not part of the saphyr project. The crate simply builds a Serde-based YAML deserialization layer around Saphyr’s public parser and is maintained independently. The name was historically chosen to reflect the use of Saphyr’s parser at a time when the Saphyr project did not provide its own Serde integration.
Benchmarking
In our benchmarking project, we tested the following crates:
| Crate | Version | Merge Keys | Nested Enums | Duplicate key rejection | Validation | Error snippet | Borrowed deserialization | Notes |
|---|---|---|---|---|---|---|---|---|
| serde-saphyr | 0.0.17 | ✅ | ✅ | ✅ Configurable | ✅garde / validator |
✅ | ✅ | Nounsafe, no unsafe-libyaml |
| serde-yaml-bw | 2.4.1 | ✅ | ✅ | ✅ Configurable | ❌ | ❌ | ❌ | Slow due Saphyr doing budget check first upfront of libyaml |
| serde-yaml-ng | 0.10.0 | ⚠️ | ❌ | ❌ | ❌ | ❌ | ✅ | |
| serde-yaml | 0.9.34 + deprecated | ⚠️ | ❌ | ❌ | ❌ | ❌ | ✅ | Original, deprecated, repo archived |
| serde-norway | 0.9.42 | ⚠️ | ❌ | ❌ | ❌ | ❌ | ✅ | |
| serde-yml | 0.0.12 | ⚠️ | ❌ | ❌ | ❌ | ❌ | ✅ | Repo archived |
| yaml-spanned | 0.0.3 | ⚠️ | ❌ | ✅ | ❌ | ❌ | ❌ | Uses libyaml-safer |
⚠️ - partial support. Serde-yaml forks do not support merge keys natively but instead provide apply_merge function that must be called manually. Crates marked ✅ offer native and transparent support.
Benchmarking was done with Criterion, giving the following results (lower is better):
As you can see, serde-saphyr outperforms the others, even with the budget check enabled.
Testing
The test suite currently includes over 1000 passing tests, including the fully converted yaml-test-suite, with ALL tests from there passing with no exceptions. To pass the last few remaining cases, we needed to fork the saphyr-parser crate (saphyr-parser-bw). Some additional cases are taken from the original serde-yaml tests.
Notable features
- Configurable budgets: Enforce input limits to mitigate resource exhaustion (e.g., deeply nested structures or very large arrays); see
Budget. - Precise error reporting with snippet rendering.
- Optional !include support with a custom or default resolver (inclusion of either a complete document or the node referenced by a specified anchor).
- Property support (to prevent leaking any secrets from YAML files)
- Serializer supports emitting anchors (Rc, Arc, Weak) if they are properly wrapped (see below).
- Declarative validation with optional
validator(example) orgarde(example). - Optional
miette(example) integration for more advanced error reporting. - serde_json::Value is supported when parsing without target structure defined.
- Serializer and Deserializer are public (due to how it's implemented, Deserializer is available in the closure only).
- Serialized floats are official YAML floats.
- Correct handling for JSON-style Unicode surrogate pairs
- robotic extensions to support YAML dialect common in robotics (see below).
WebAssembly
serde-saphyr is compatible with WebAssembly. CI flow includes builds for both wasm32-unknown-unknown (browser / JS) and wasm32-wasip1 (WASI runtimes) with full test suite running and passing. We also wrote yva in dioxus to deploy serde-saphyr on the web.
Usage
use Deserialize;
Using serializer or deserializer specifically
To speed up compilation, starting from version 0.0.23 you can link only the deserializer or only the serializer (along with their respective dependencies). For easier initial integration, both serialize and deserialize features are enabled by default.
If you only need one side, you can disable default features and enable only the API surface you use:
= { = "0.0.23", = false, = ["deserialize"] }
or
= { = "0.0.23", = false, = ["serialize"] }
Disabling both will produce a "Invalid feature configuration" error (such configuration makes no sense).
Snippets
To make debugging easier, serde-saphyr renders snippets of the YAML that caused an error (similar to how many compilers report errors). These snippets include the line where the error occurred along with some surrounding context. Any terminal control sequences that might be present in the YAML are stripped out. If not desired, snippets can be removed for a specific error using without_snippet, or disabled entirely via the Options configuration.
Garde and Validator integration
This crate optionally integrates with validator or garde to run declarative validation. serde-saphyr error will print the snippet, providing location information. If the invalid value comes from the YAML anchor, serde-saphyr will also tell where this anchor has been defined.
Garde
use Validate;
use Deserialize;
// Rust in snake_case, YAML in camelCase.
Validator
use Deserialize;
use Validate;
// Rust in snake_case, YAML in camelCase.
A typical output with serde-saphyr native snippet rendering looks like:
error: line 3 column 23: invalid here, validation error: length is lower than 2 for `secondString`
--> the value is used here:3:23
|
1 |
2 | firstString: &A "x"
3 | secondString: *A
| ^ invalid here, validation error: length is lower than 2 for `secondString`
4 |
|
| This value comes indirectly from the anchor at line 2 column 25:
|
1 |
2 | firstString: &A "x"
| ^ defined here
3 | secondString: *A
4 |
The integration of garde is feature-gated and disabled by default. Use serde-saphyr = { version = "0.0.17", features = ["garde"] } (or features = ["validator"]) in Cargo.toml to enable it.
If you prefer to validate without validation crates and want to ensure that location information is always available, use the heavier approach with Spanned<T> wrapper instead.
Duplicate keys
Duplicate key handling is configurable. By default it’s an error; “first wins” and “last wins” strategies are available via Options. Duplicate key policy applies not just to strings but also to other types (if used as keys when deserializing into map).
Multiple documents
YAML streams can contain several documents separated by ---/... markers. When deserializing with serde_saphyr::from_multiple, you still need to supply the vector element type up front (Vec<T>). That does not lock you into a single shape: make the element an enum and each document will deserialize into the matching variant. This lets you mix different payloads in one stream while retaining strong typing on the Rust side.
use Deserialize;
Nested enums
Externally tagged enums nest naturally in YAML as maps keyed by the variant name. This enables strict, expressive models (enums with associated data) instead of generic maps.
use Deserialize;
There are two variants of the deserialization functions: from_* and from_*_with_options. The latter accepts an Options object that allows you to configure budget and other aspects of parsing. For larger projects that require consistent parsing behavior, we recommend defining a wrapper function so that all option and budget settings are managed in one place (see examples/wrapper_function.rs).
Tagged enums written as !!EnumName VARIANT are also supported, but only for single-level scalar variants. YAML itself cannot nest such tagged enums, so use mapping-based representations (EnumName: RED) if you need to embed enums within other enums.
Composite keys
YAML supports complex (non-string) mapping keys. Rust maps can mirror this, allowing you to parse such structures directly.
use ;
use HashMap;
Options
Serde-saphyr provides control over serialization and deserialization behavior. We generally welcome feature requests, but we also recognize that not every user wants every feature enabled by default.
To support different use cases, most behavior can be enabled, disabled, or tuned via Options (deserializers) and SerializerOptions (serializers). Adding fields to the public API is a breaking change. To allow new options without breaking compatibility, Serde-saphyr uses a macro-driven approach based on the options!, budget!, and ser_options! macros.
API note: serde-saphyr is moving away from struct literals for its configuration structs (Options, SerializerOptions, Budget). Struct literals and direct field access will be deprecated soon. In the first 1.x release, these types will become #[non_exhaustive] to prevent direct instantiation. During the migration period, semver checks temporarily allow adding fields to these structures, and the badge does not treat new fields as a breaking change (which is correct when using the macros).
Indentation checking
Adding or removing a single space in YAML indentation may result in a document that is still syntactically correct but semantically wrong. To mitigate such issues, serde-saphyr can enforce indentation rules during deserialization via RequireIndent.
You can require the number of indentation columns to be consistent throughout the document, ensure it is even, or enforce that it is divisible by a specific number (for example, 4 or 6). Configure the desired policy using Options.
Booleans
By default, if the target field is boolean, serde-saphyr will attempt to interpret standard YAML 1.1 values as boolean (not just false but also no, etc.).
If you do not want this (or you are parsing into a JSON Value where it is wrongly inferred), enclose the value in quotes or set strict_booleans to true in Options.
Deserializing into abstract JSON Value
If you must work with abstract types, you can also deserialize YAML into serde_json::Value. Serde will drive the process through deserialize_any because Value does not fix a Rust primitive type ahead of time. You lose strict type control by Rust struct data types. Also, unlike YAML, JSON does not allow composite keys, keys must be strings. Field order will be preserved.
Binary scalars
!!binary-tagged YAML values are base64-decoded when deserializing into Vec<u8> or String (reporting an error if it is not valid UTF-8).
use Deserialize;
Important: some projects add the !!binary tag while actually expecting a verbatim string value (for example, the literal string "aGVsbG8="). This works with parsers that simply ignore the tag. However, serde-saphyr decodes !!binary values by default, attempting to interpret them as UTF-8 bytes.
If you use !!binary only as a documentation or annotation tag, enable ignore_binary_tag_for_string = true in Options.
use Deserialize;
!!binary for other types like Vec<u8> will stay supported.
Merge keys
serde-saphyr supports merge keys, which reduce redundancy and verbosity by specifying shared key-value pairs once and then reusing them across multiple mappings. Here is an example with merge keys (inherited properties):
use Deserialize;
/// Configuration to parse into. Does not include "defaults"
Merge keys are standard in YAML 1.1. Although YAML 1.2 no longer includes merge keys in its specification, it doesn't explicitly disallow them either, and many parsers implement this feature.
Properties
Many configuration formats contain secret values that should not be part of checked-in repository content and should leak into error snippets or other messages. The optional properties feature adds docker-compose-style ${NAME} interpolation for that use case, allowing to provide values through Options. The feature can also be used to configure generated values, or values that change between releases or deployments, or ar otherwise more convenient to specify separately from the main yaml document.
Interpolation is intentionally narrow in scope:
- it only applies to plain scalars,
- quoted scalars and block scalars stay literal,
$${NAME}escapes to a literal${NAME},- and if no property map is configured,
${NAME}remains unchanged instead of being treated specially.
That means you can opt in where it is useful without changing the meaning of YAML constructs that are typically expected to remain exact text.
properties is gated behind the properties feature flag. Once enabled, pass a property map through Options::with_properties(...):
If interpolation is enabled but a referenced property is missing, or the ${...} name is invalid, deserialization fails with a dedicated error that points to the YAML source location. This is useful both for correctness and for security: configuration mistakes should fail closed instead of silently producing partial or surprising values.
The security aspect matters most when the property values are secrets. Interpolation resolves the final value before Serde finishes deserializing the surrounding type, so downstream custom deserializers, validation code, or other error paths could otherwise end up echoing the resolved secret. serde-saphyr tracks interpolated values during deserialization and redacts them back to their original ${NAME} form in later error messages, reducing the risk of leaking secrets into logs or diagnostics. You should still treat the property map itself as sensitive input and avoid formatting or logging it directly in your application.
Includes
The need for include YAML (not part of the official specs) is seen from the popularity of command-line yaml-include crate. That crate is very feature-complete. However, if YAML parser and validator are separate from pre-processor, they usually only report the line number and snippet in the processed document. For large documents with multiple and deep includes, this gets challenging to interpret. YAML indentation and security requirements like path confinement or anchor isolation make "quick adding" include non-trivial.
serde-saphyr allows resolving !include tags via a custom resolver configured in Options. When using a single !include directly as a value, it works naturally for replacing a scalar, sequence, or an entire mapping:
# Replacing the entire mapping value
my_mapping: my_mapping.yaml
# Supplying a list/sequence value
my_list: my_list.yaml
However, if you want to include a mapping and merge its keys into a parent mapping alongside other keys, you must use the merge key (<<). Attempting to list !include inside a mapping without a merge key is invalid YAML syntax:
# INVALID: `!include` is treated as a key missing a value (`:`)
a: 1
my_mapping.yaml
b: 2
Instead, use the merge key to correctly inject the included mapping:
# VALID: merges the contents of my_mapping.yaml
a: 1
<<: my_mapping.yaml
b: 2
!include is gated behind the include feature flag. If it is not enabled, or the resolver is not set, this tag has no special treatment. The include feature allows resolvers that do not access the filesystem. For the most common case, where files are included from the filesystem, include_fs must be enabled as well. Then the most common way to enable includes looks like this:
You can alternatively use SafeFileResolver to configure more options, or provide your own IncludeResolver callback that resolves a name into YAML text, which can be useful for custom storage backends or generated YAML without using the filesystem. The safety features of this resolver are summarized in the documentation header of this class.
Instead of including the whole document, you can also include only the value of a specific anchor defined in the included YAML document:
my_mapping.yaml#anchor_name
SafeFileResolver has a built-in capability for anchor extraction. For flexibility, custom IncludeResolver implementations must do this on their own, splitting anchor from the reference and then returning InputSource::AnchoredText.
Other than stated, the anchor scope is restricted to the document where it is defined. Overriding a parent anchor value somewhere deep inside included content would be challenging to debug and could even become a security issue.
Whole-document includes only support sources that contain a single YAML document. Fragment includes also require the included source to contain a single YAML document; multi-document sources are rejected instead of scanning across document boundaries. Recursive inclusion is not permitted (and the file, not the fragment, is the include identity.)
Tuple enum variants
It is possible to deserialize tuple enum variants:
serde_saphyr::from_str::<Context>(yaml) would take the value: !Expression 1 + 1 or value: !Pair [a, 12]. Both YAML lists and Rust tuples allow their elements to have different types.
Rust types as schema
To address the “Norway problem,” the target Rust types serve as an explicit schema. Because the parser knows whether a field expects a string or a boolean, it can correctly accept 1.2 either as a number or as the string "1.2", and interpret the common YAML boolean shorthands (y, on, n, off) as actual booleans when appropriate (can be disabled). Likewise, 0x2A is parsed as a hexadecimal integer when the target field is numeric, and as a string when the target is String. As with StrictYAML, serde-saphyr avoids inferring types from values — one of the most heavily criticized aspects of YAML. The Rust type system already provides all the necessary schema information.
Schema-based parsing can be disabled by setting no_schema to true in Options. In this case all unquoted values that are parsed into strings, but can be understood as something else, are rejected. This can be used for enforcing compatibility with another YAML parser that reads the same content and requires this quoting. Default setting is false.
Legacy octal notation such as 0052 can be enabled via Options, but it is disabled by default.
The concept that “Rust code is the schema” naturally extends to implemented support for validator and garde, as these crates allow annotations to be added directly to Rust types, providing even stricter control over permissible values
Pathological inputs & budgets
Fuzzing shows that certain adversarial inputs can make YAML parsers consume excessive time or memory, enabling denial-of-service scenarios. To counter this, serde-saphyr offers a fast, configurable pre-check via a Budget, available through Options. Defaults are conservative; tighten them when you know your input shape, or disable the budget if you only parse YAML you generate yourself.
During reader-based deserialization, serde-saphyr does not buffer the entire payload; it parses incrementally, counting bytes and enforcing configured budgets. This design blocks denial-of-service attempts via excessively large inputs. When streaming from the reader through the iterator, other budget limits apply on a per-document basis, since such a reader may be expected to stream indefinitely. The total size of input is not limited in this case.
To find the typical budget requirements for your file, use our web demo or run the main() executable of this library, providing a YAML file path as the program parameter. You can also fetch the budget programmatically by registering a closure with Options::with_budget_report.
Serialization
use Serialize;
let yaml = to_string.unwrap;
assert!;
Anchors (Rc/Arc/Weak)
Serde-saphyr can conceptually connect YAML anchors with Rust shared references (Rc, Weak and Arc). You need to use wrappers to activate this feature:
- RcAnchor and ArcAnchor emit anchors like
&a1on first occurrence and may emit aliases*a1later. - RcWeakAnchor and ArcWeakAnchor serialize a weak ref: if the strong pointer is gone, it becomes
null.
let the_a = from;
let data = Bigger ;
let serialized = to_string?;
assert_eq!;
let deserialized: Bigger = from_str?;
assert_eq!;
assert_eq!;
assert!;
Ok
}
When anchors are highly repetitive and also large, packing them into references can make YAML more human-readable.
To support round-tripping, the library can also deserialize into these anchor structures; this deserialization is identity-preserving. A field or structure that is defined once and subsequently referenced will exist as a single instance in memory, with all anchor fields pointing to it. This is crucial when the topology of references itself constitutes important information to be transferred.
Recursive YAML
While recursive YAML is unusual, it is not forbidden by the specification. Real-world examples and requests to implement it exist.
Serde-saphyr supports recursive structures, but Rust requires being very explicit about this. A structure that may hold recursive references to itself must be wrapped in a RcRecursive<T>, and any reference that points to it must be RcRecursion<T>. Arc varieties exist. See also examples/recursive_yaml.rs.
Controlling deserialization
- Empty maps are serialized as {} and empty lists as [] by default.
- Strings containing newlines, and very long strings are serialized as appropriate block scalars, except in cases where they would need escaping (like ending with
:). - Indentation is configurable.
- The wrapper Commented allows emitting a comment next to a scalar or reference (handy when the reference is far from its definition and needs explanation).
- The wrapper SpaceAfter adds an empty line after the wrapped value, useful for visually separating sections in the output YAML.
- It is possible to request that all strings be quoted — using single quotes when no escape sequences are present, and double quotes otherwise. This is very explicit and unambiguous, but such YAML may be less readable for humans. Line wrapping is disabled in this mode.
- YAML 1.1 booleans (
y,yes,on, etc.) are normally quoted as both keys and values. If this is undesired (y is a coordinate), setyaml_12to true.
These settings are changeable in SerializerOptions.
Borrowed string deserialization
serde-saphyr supports zero-copy deserialization for string fields when using from_str or from_slice. This allows deserializing into &str fields that borrow directly from the input, avoiding allocation overhead.
use Deserialize;
let yaml = "name: hello\nvalue: 42\n";
let data: Data = from_str.unwrap;
assert_eq!;
Limitations:
- Borrowing works for any scalar whose parsed value exists verbatim in the input. This includes plain scalars and simple quoted strings without escape sequences (e.g.,
"hello world"can be borrowed, but"hello\nworld"cannot because\nis transformed to a newline). - If a scalar requires transformation (escape processing, line folding, block scalar normalization, or
''escape in single-quoted strings), deserialization into&strfails with a helpful error suggestingStringorCow<str>. - Reader-based entry points (
from_reader) requireDeserializeOwnedand cannot return borrowed values.
For maximum flexibility, use Cow<'a, str> which borrows when possible and owns when transformation is required.
Custom messages
The default error messages are developer-oriented. They may mention serde-saphyr APIs and
options and include “action items” intended to help fix the problem.
If error messages are shown to end users, switch to the built-in user-facing formatter or provide your own formatter (for example, to translate messages into another language).
See:
MessageFormatter— controls the main message text for eachError.Localizer— controls message pieces that are composed outsideMessageFormatter::format_message(location suffixes, validation/snippet labels, etc.).
Use the built-in user-facing formatter
use UserMessageFormatter;
# let err = .unwrap_err;
println!;
Use a custom formatter with miette
If you want fancy diagnostics via miette, you can convert a serde-saphyr error to a
miette::Report while still controlling the message text via a custom formatter:
use ;
This requires enabling the crate’s miette feature.
For a complete custom formatter/localizer example, see examples/pirate_formatter.rs. For an
end-to-end miette example, see examples/miette.rs.
Robotics
The feature-gated "robotics" capability enables parsing of YAML extensions commonly used in robotics (ROS). These extensions support conversion functions (deg, rad) and simple mathematical expressions such as deg(180), rad(pi), 1 + 2*(3 - 4/5), or rad(pi/2). This capability is gated behind the robotics feature and is not enabled by default. Additionally, angle_conversions must be set to true in the Options. Just adding the robotics feature is not enough to activate this mode of parsing. This parser is still just a simple expression calculator implemented directly in Rust, not some hook into a language interpreter.
rad_tag: 0.15 # value in radians, stays in radians
deg_tag: 180 # value in degrees, converts to radians
expr_complex: 1 + 2*(3 - 4/5) # simple expressions supported
func_deg: deg(180) # value in degrees, converts to radians
func_rad: rad(pi) # value in radians (stays in radians)
hh_mm_secs: -0:30:30.5 # Time
longitude: 8:32:53.2 # Nautical, ETH Zürich Main Building (8°32′53.2″ E)
let options = Options ;
let v: RoboFloats = from_str_with_options.expect;
Safety hardening with this feature enabled include (maximal expression depth, maximal number of digits, strict underscore placement and fraction parsing limits to precision-relevant digit).
Unsupported features
- Common Serde renames made to follow naming conventions (case changes, snake_case, kebab-case, r# stripping) are supported in snippets, as long as they do not introduce ambiguity. Arbitrary renames, flattening, aliases and other complex manipulations possible with serde are not. Parsing and validation will still work, but error messages for arbitrarily renamed fields only tell Rust path.
Spanned<T>cannot be used within variants of untagged or internally tagged enums due to a fundamental limitation in Serde. Instead, wrap the entire enum in Spanned, or use externally tagged enums (the default).
Executable
serde-saphyr comes with a simple executable (CLI) that can be used to check the budget of a given YAML file and also used as YAML validator printing YAML error line, column numbers and excerpt.
To run it (no Rust knowledge required):
# binary name is the package name by default
To enable fancy error reporting (graphical diagnostics) via the optional miette integration, install/build the CLI with the miette feature enabled:
# install with miette enabled
# or run from a git checkout
If you want to keep the previous plain-text error output even when built with miette, pass --plain:
If you want to allow file inclusion (!include tags) during parsing, configure the filesystem root path using --include: