hedl-stream
Streaming parser for HEDL documents. When you have a multi-gigabyte file and cannot load it into memory, this crate processes it incrementally with constant memory overhead.
hedl-core parses entire documents into memory before giving you access. That works for configuration files and small datasets, but fails for database exports, log archives, and data pipelines where files reach tens of gigabytes. This crate solves that by emitting events as it parses: you receive each node immediately after it's read from the input stream, and memory usage stays proportional to nesting depth rather than file size.
Getting Started
[]
= "1"
For async support with Tokio:
[]
= { = "1", = ["async"] }
= { = "1", = ["io-util"] }
Basic Usage
Open a file, iterate through events, do something with each node:
use ;
use File;
let file = open?;
let parser = new?;
let mut node_count = 0;
for event in parser
println!;
Working with Untrusted Input
When parsing files you don't control, timeouts prevent someone from feeding you a file designed to hang your server:
use ;
use Duration;
use File;
let config = StreamingParserConfig ;
let file = open?;
let parser = with_config?;
Async Parsing
Same API, but for when you're juggling thousands of concurrent streams:
use AsyncStreamingParser;
use File;
async
Events You'll See
The parser emits events as it encounters them:
Header comes first, containing the document's version, schemas, aliases, and nesting rules.
ListStart marks the beginning of an entity list (like users: @User[id, name, age]), telling you the field name, type, and column schema.
Node is an individual row from a list. Each node carries its type, ID, field values, nesting depth, and parent information if nested.
ListEnd fires when a list finishes, including a count of how many nodes it contained.
Scalar appears for simple key-value pairs outside of lists.
ObjectStart and ObjectEnd bracket nested objects that aren't entity lists.
EndOfDocument means you've reached the end successfully.
HEDL Features the Parser Handles
Matrix rows with CSV-like syntax:
users: @User[id, name, age]
| alice, Alice Smith, 30
| bob, "Jones, Bob", 25
| carol, "Said \"hello\"", 35
Entity references are detected automatically:
customer: @User:alice # Qualified reference to User alice
parent: @previous_item # Local reference
Alias substitution:
%A:api_url: https://api.example.com
---
config:
endpoint: $api_url # Becomes https://api.example.com
Comments anywhere:
# Full line comment
users: @User[id, name]
| alice, Alice # Inline comment
| bob, "Bob # This is not a comment (inside quotes)"
Error Handling
Errors tell you what went wrong and where:
for event in parser
Performance Notes
Memory usage depends on nesting depth, not file size. A 50 GB file with 10 levels of nesting uses the same memory as a 50 KB file with 10 levels of nesting.
The default 64 KB I/O buffer works well for most cases. Bump it to 128-256 KB for maximum throughput on large files.
On x86_64 with AVX2, comment detection uses SIMD for faster scanning of comment-heavy files.
Timeout checks happen every 100 operations, adding about 0.1% overhead.
Dependencies
hedl-corefor shared types and lexer utilitiesthiserrorfor error definitionstokio(optional, with the "async" feature) for async I/O
License
Apache-2.0