hedl-stream 1.0.0

Streaming parser for HEDL - memory-efficient processing of large files
Documentation

Streaming HEDL Parser

This crate provides a streaming, memory-efficient parser for HEDL documents. Instead of loading the entire document into memory, it yields events or nodes one at a time, making it suitable for processing multi-GB files.

Features

  • Memory Efficient: Process files larger than available RAM
  • Iterator-based: Standard Rust iterator interface (sync)
  • Async Support: Non-blocking I/O with tokio (optional)
  • Event-driven: Optional SAX-like event callbacks
  • Timeout Protection: Prevent infinite loops from malicious/untrusted input
  • Compatible: Works with hedl-parquet and hedl-neo4j for streaming export

Sync vs Async

Synchronous API (default)

Use the synchronous API for:

  • Processing local files
  • Single-threaded batch processing
  • Simpler code without async complexity
  • CPU-bound workloads with minimal I/O wait
use hedl_stream::{StreamingParser, NodeEvent};
use std::io::BufReader;
use std::fs::File;

let file = File::open("large-dataset.hedl").unwrap();
let reader = BufReader::new(file);

let parser = StreamingParser::new(reader).unwrap();

for event in parser {
    match event {
        Ok(NodeEvent::Node(node)) => {
            println!("{}:{}", node.type_name, node.id);
        }
        Ok(NodeEvent::ListStart { type_name, .. }) => {
            println!("List started: {}", type_name);
        }
        Err(e) => {
            eprintln!("Error: {}", e);
            break;
        }
        _ => {}
    }
}

Asynchronous API (feature = "async")

Use the asynchronous API for:

  • Processing network streams or pipes
  • High-concurrency scenarios (many parallel streams)
  • Integration with async web servers or frameworks
  • Non-blocking I/O in async runtime contexts
# #[cfg(feature = "async")]
# async fn example() -> Result<(), Box<dyn std::error::Error>> {
use hedl_stream::{AsyncStreamingParser, NodeEvent};
use tokio::fs::File;

let file = File::open("large-dataset.hedl").await?;
let parser = AsyncStreamingParser::new(file).await?;

while let Some(event) = parser.next_event().await? {
    match event {
        NodeEvent::Node(node) => {
            println!("{}:{}", node.type_name, node.id);
        }
        NodeEvent::ListStart { type_name, .. } => {
            println!("List started: {}", type_name);
        }
        _ => {}
    }
}
# Ok(())
# }

Timeout Protection for Untrusted Input

When parsing untrusted input, configure a timeout to prevent infinite loops:

use hedl_stream::{StreamingParser, StreamingParserConfig};
use std::time::Duration;
use std::io::Cursor;

let config = StreamingParserConfig {
    timeout: Some(Duration::from_secs(10)),
    ..Default::default()
};

let untrusted_input = "..."; // Input from untrusted source
let parser = StreamingParser::with_config(
    Cursor::new(untrusted_input),
    config
).unwrap();

// Parser will return StreamError::Timeout if parsing exceeds 10 seconds
for event in parser {
    // Process events...
    # break;
}