Skip to main content

Module parallel

Module parallel 

Source
Available on crate feature parallel only.
Expand description

Parallel multi-document YAML parsing via Rayon. Gated by the parallel feature. Parallel multi-document YAML parsing — the “MapReduce” path.

For massive multi-document streams (telemetry logs, audit exports, Kubernetes-resource snapshots, anything emitting ---- separated documents at scale), even the fastest single-threaded parser is bounded by one CPU core. This module pre-scans the input on the main thread, splits it into per-document slices, then dispatches each document to a Rayon worker.

Gated behind the parallel Cargo feature.

§Linear scaling

The pre-scan runs in O(input_len) with no allocation; the parse-per-document work is the dominant cost and parallelises naturally across cores. Expect near-linear speedup with the number of cores up to the point where document size starts to dominate (very large single documents see less benefit because one document still parses on one thread).

§Document-boundary contract

The pre-scanner recognises --- document-start markers that begin at column 0 and are followed by \n, \r, , \t, or end-of-input. This matches the YAML 1.2.2 §9.1.2 grammar for c-directives-end. The scanner does not recognise:

  • --- inside a literal (|) or folded (>) block scalar that is column-0-aligned (extremely rare in practice; the YAML spec does not actually permit such a literal because block scalars must indent past the parent).
  • ... document-end markers — they are advisory in YAML 1.2, and the document-start scan picks up the next document anyway.

Inputs that violate the column-0 rule fall back to the conservative single-document slice (everything before the next valid --- is treated as one document).

§Examples

let yaml = "---\nid: 1\n---\nid: 2\n---\nid: 3\n";
#[derive(serde::Deserialize, Debug)]
struct Record { id: u32 }
let records: Vec<Record> = noyalib::parallel::parse(yaml).unwrap();
assert_eq!(records.len(), 3);

§API shape

  • parse — typed deserialise into Vec<T>.
  • values — dynamic-tree variant returning Vec<Value>.
  • split — standalone document-boundary pre-scanner for callers driving their own concurrency primitives.

Names are kept short on purpose — the parallel namespace already encodes the concurrency contract, so the function verb stays single-word: parallel::parse reads as one sentence.

Functions§

parse
Deserialise every YAML document in input into T, parsing in parallel via Rayon’s global thread pool.
split
Split input into per-document byte slices on YAML 1.2 --- markers. Single-pass O(input.len()). Public so callers that drive their own concurrency primitives (async tasks, custom thread pools) can reuse the same boundary scan.
values
Dynamic-tree variant of parse: returns a Vec<crate::Value>. Use when the caller wants to route documents to different typed handlers post-parse.