Streaming HEDL Parser
This crate provides a streaming, memory-efficient parser for HEDL documents. Instead of loading the entire document into memory, it yields events or nodes one at a time, making it suitable for processing multi-GB files.
Features
- Memory Efficient: Process files larger than available RAM
- Iterator-based: Standard Rust iterator interface (sync)
- Async Support: Non-blocking I/O with tokio (optional)
- Event-driven: Optional SAX-like event callbacks
- Timeout Protection: Prevent infinite loops from malicious/untrusted input
- Compatible: Works with
hedl-parquetandhedl-neo4jfor streaming export
Sync vs Async
Synchronous API (default)
Use the synchronous API for:
- Processing local files
- Single-threaded batch processing
- Simpler code without async complexity
- CPU-bound workloads with minimal I/O wait
use ;
use BufReader;
use File;
let file = open.unwrap;
let reader = new;
let parser = new.unwrap;
for event in parser
Asynchronous API (feature = "async")
Use the asynchronous API for:
- Processing network streams or pipes
- High-concurrency scenarios (many parallel streams)
- Integration with async web servers or frameworks
- Non-blocking I/O in async runtime contexts
#
# async
Timeout Protection for Untrusted Input
When parsing untrusted input, configure a timeout to prevent infinite loops:
use ;
use Duration;
use Cursor;
let config = StreamingParserConfig ;
let untrusted_input = "..."; // Input from untrusted source
let parser = with_config.unwrap;
// Parser will return StreamError::Timeout if parsing exceeds 10 seconds
for event in parser