rlsp-yaml-parser
A spec-faithful YAML 1.2 parser for the rlsp language server project.
Overview
rlsp-yaml-parser implements the full YAML 1.2 grammar by transliterating each of the 211 formal productions from the specification into a parser combinator function. Comments and spans are first-class data, making it suitable for editor tooling, linters, and formatters where precise source locations matter.
Features
- Spec-faithful -- every production from the YAML 1.2 specification is implemented directly
- 100% conformance -- passes 351/351 cases in the YAML Test Suite
- First-class comments -- comments are preserved in the event stream and AST, not discarded
- Lossless spans -- every token, event, and AST node carries a
Spanwith byte offsets back to the source - Alias preservation -- lossless mode keeps alias references as
Node::Aliasnodes instead of expanding them - Security controls -- alias expansion limits, nesting depth caps, anchor count limits, and cycle detection protect against untrusted input
Quick Start
Parse events
Stream through low-level parse events without building an AST:
use parse_events;
use Event;
for result in parse_events
Load documents
Parse into an AST:
use load;
let docs = load.unwrap;
assert_eq!;
Use LoaderBuilder for fine-grained control:
use LoaderBuilder;
let docs = new
.resolved // expand aliases inline
.max_nesting_depth // tighten nesting limit
.build
.load
.unwrap;
Emit YAML
Convert an AST back to YAML text:
use load;
use ;
let docs = load.unwrap;
let output = emit;
println!;
API Overview
| Module | Entry point | Purpose |
|---|---|---|
stream |
tokenize(input) |
Tokenize YAML into a flat token list |
event |
parse_events(input) |
Stream parse events with spans |
loader |
load(input) / LoaderBuilder |
Build an AST (Vec<Document<Span>>) |
emitter |
emit(docs, config) |
Emit AST back to YAML text |
schema |
CoreSchema / JsonSchema / FailsafeSchema |
Resolve scalars to typed values |
node |
Document, Node |
AST types with anchor, tag, and span data |
Schemas
Three built-in schemas resolve untagged scalars to typed values:
| Schema | Behaviour |
|---|---|
CoreSchema |
YAML 1.2 Core -- null, bool, int (decimal/octal/hex), float |
JsonSchema |
Strict JSON-compatible type inference |
FailsafeSchema |
All scalars are strings |
The Schema trait is object-safe for custom implementations.
Security Limits
The loader enforces configurable limits to protect against malicious input:
| Limit | Default | Purpose |
|---|---|---|
max_nesting_depth |
512 | Prevents stack exhaustion from deeply nested structures |
max_anchors |
10,000 | Bounds anchor-map memory |
max_expanded_nodes |
1,000,000 | Guards against alias bombs (resolved mode only) |
Circular alias references are detected and reported as errors in both modes.
Conformance
351/351 test cases pass from the YAML Test Suite (valid and invalid inputs).
Performance
Criterion benchmarks compare rlsp-yaml-parser against libfyaml (a C reference parser). The table below shows representative throughput on synthetic fixtures (higher is better):
| Fixture | rlsp-yaml-parser (load) |
libfyaml (parse_events) |
|---|---|---|
| 100 B (tiny) | ~0.7 MB/s | ~33 MB/s |
| 10 KB (medium) | ~0.6 MB/s | ~100 MB/s |
| 100 KB (large) | ~0.5 MB/s | ~115 MB/s |
libfyaml is a highly optimized C library. rlsp-yaml-parser prioritizes correctness and spec fidelity over raw speed -- it tokenizes eagerly to provide full span coverage. Performance is sufficient for interactive editor use (the LSP use case) where documents are typically small.
Three benchmark suites are included:
- Throughput (
throughput) -- MB/s across document sizes and YAML styles - Latency (
latency) -- time-to-first-event for streaming scenarios - Memory (
memory) -- allocation count and bytes during parse
Building
License
MIT -- Christoph Dalski