rlsp-yaml-parser

A spec-faithful YAML 1.2 parser for the rlsp language server project.

Overview

rlsp-yaml-parser implements the full YAML 1.2 grammar by transliterating each of the 211 formal productions from the specification into a parser combinator function. Comments and spans are first-class data, making it suitable for editor tooling, linters, and formatters where precise source locations matter.

Features

Spec-faithful -- every production from the YAML 1.2 specification is implemented directly
100% conformance -- passes 351/351 cases in the YAML Test Suite
First-class comments -- comments are preserved in the event stream and AST, not discarded
Lossless spans -- every token, event, and AST node carries a Span with byte offsets back to the source
Alias preservation -- lossless mode keeps alias references as Node::Alias nodes instead of expanding them
Security controls -- alias expansion limits, nesting depth caps, anchor count limits, and cycle detection protect against untrusted input

Quick Start

Parse events

Stream through low-level parse events without building an AST:

use rlsp_yaml_parser::parse_events;
use rlsp_yaml_parser::event::Event;

for result in parse_events("hello: world\n") {
    let (event, span) = result.unwrap();
    println!("{event:?} at {span:?}");
}

Load documents

Parse into an AST:

use rlsp_yaml_parser::load;

let docs = load("hello: world\n").unwrap();
assert_eq!(docs.len(), 1);

Use LoaderBuilder for fine-grained control:

use rlsp_yaml_parser::loader::LoaderBuilder;

let docs = LoaderBuilder::new()
    .resolved()              // expand aliases inline
    .max_nesting_depth(128)  // tighten nesting limit
    .build()
    .load("items:\n  - one\n  - two\n")
    .unwrap();

Emit YAML

Convert an AST back to YAML text:

use rlsp_yaml_parser::load;
use rlsp_yaml_parser::emitter::{emit, EmitConfig};

let docs = load("hello: world\n").unwrap();
let output = emit(&docs, &EmitConfig::default());
println!("{output}");

API Overview

Module	Entry point	Purpose
`stream`	`tokenize(input)`	Tokenize YAML into a flat token list
`event`	`parse_events(input)`	Stream parse events with spans
`loader`	`load(input)` / `LoaderBuilder`	Build an AST (`Vec<Document<Span>>`)
`emitter`	`emit(docs, config)`	Emit AST back to YAML text
`schema`	`CoreSchema` / `JsonSchema` / `FailsafeSchema`	Resolve scalars to typed values
`node`	`Document`, `Node`	AST types with anchor, tag, and span data

Schemas

Three built-in schemas resolve untagged scalars to typed values:

Schema	Behaviour
`CoreSchema`	YAML 1.2 Core -- null, bool, int (decimal/octal/hex), float
`JsonSchema`	Strict JSON-compatible type inference
`FailsafeSchema`	All scalars are strings

The Schema trait is object-safe for custom implementations.

Security Limits

The loader enforces configurable limits to protect against malicious input:

Limit	Default	Purpose
`max_nesting_depth`	512	Prevents stack exhaustion from deeply nested structures
`max_anchors`	10,000	Bounds anchor-map memory
`max_expanded_nodes`	1,000,000	Guards against alias bombs (resolved mode only)

Circular alias references are detected and reported as errors in both modes.

Conformance

351/351 test cases pass from the YAML Test Suite (valid and invalid inputs).

cargo test -p rlsp-yaml-parser --test conformance

Performance

Criterion benchmarks compare rlsp-yaml-parser against libfyaml (a C reference parser). The table below shows representative throughput on synthetic fixtures (higher is better):

Fixture	rlsp-yaml-parser (`load`)	libfyaml (`parse_events`)
100 B (tiny)	~0.7 MB/s	~33 MB/s
10 KB (medium)	~0.6 MB/s	~100 MB/s
100 KB (large)	~0.5 MB/s	~115 MB/s

libfyaml is a highly optimized C library. rlsp-yaml-parser prioritizes correctness and spec fidelity over raw speed -- it tokenizes eagerly to provide full span coverage. Performance is sufficient for interactive editor use (the LSP use case) where documents are typically small.

Three benchmark suites are included:

Throughput (throughput) -- MB/s across document sizes and YAML styles
Latency (latency) -- time-to-first-event for streaming scenarios
Memory (memory) -- allocation count and bytes during parse

cargo bench -p rlsp-yaml-parser

Building

cargo build -p rlsp-yaml-parser
cargo test -p rlsp-yaml-parser
cargo clippy -p rlsp-yaml-parser  # pedantic + nursery, zero warnings
cargo bench -p rlsp-yaml-parser   # Criterion benchmarks

License

MIT -- Christoph Dalski

rlsp-yaml-parser 0.2.0