Skip to main content

Crate hedl_stream

Crate hedl_stream 

Source
Expand description

Streaming HEDL Parser

This crate provides a streaming, memory-efficient parser for HEDL documents. Instead of loading the entire document into memory, it yields events or nodes one at a time, making it suitable for processing multi-GB files.

§Features

  • Memory Efficient: Process files larger than available RAM
  • Iterator-based: Standard Rust iterator interface (sync)
  • Async Support: Non-blocking I/O with tokio (optional)
  • Event-driven: Optional SAX-like event callbacks
  • Timeout Protection: Prevent infinite loops from malicious/untrusted input
  • Compatible: Works with hedl-parquet and hedl-neo4j for streaming export

§Sync vs Async

§Synchronous API (default)

Use the synchronous API for:

  • Processing local files
  • Single-threaded batch processing
  • Simpler code without async complexity
  • CPU-bound workloads with minimal I/O wait
use hedl_stream::{StreamingParser, NodeEvent};
use std::io::BufReader;
use std::fs::File;

let file = File::open("large-dataset.hedl").unwrap();
let reader = BufReader::new(file);

let parser = StreamingParser::new(reader).unwrap();

for event in parser {
    match event {
        Ok(NodeEvent::Node(node)) => {
            println!("{}:{}", node.type_name, node.id);
        }
        Ok(NodeEvent::ListStart { type_name, .. }) => {
            println!("List started: {}", type_name);
        }
        Err(e) => {
            eprintln!("Error: {}", e);
            break;
        }
        _ => {}
    }
}

§Asynchronous API (feature = “async”)

Use the asynchronous API for:

  • Processing network streams or pipes
  • High-concurrency scenarios (many parallel streams)
  • Integration with async web servers or frameworks
  • Non-blocking I/O in async runtime contexts
use hedl_stream::{AsyncStreamingParser, NodeEvent};
use tokio::fs::File;

let file = File::open("large-dataset.hedl").await?;
let mut parser = AsyncStreamingParser::new(file).await?;

while let Some(event) = parser.next_event().await? {
    match event {
        NodeEvent::Node(node) => {
            println!("{}:{}", node.type_name, node.id);
        }
        NodeEvent::ListStart { type_name, .. } => {
            println!("List started: {}", type_name);
        }
        _ => {}
    }
}

§Timeout Protection for Untrusted Input

When parsing untrusted input, configure a timeout to prevent infinite loops:

use hedl_stream::{StreamingParser, StreamingParserConfig};
use std::time::Duration;
use std::io::Cursor;

let config = StreamingParserConfig {
    timeout: Some(Duration::from_secs(10)),
    ..Default::default()
};

let untrusted_input = "..."; // Input from untrusted source
let parser = StreamingParser::with_config(
    Cursor::new(untrusted_input),
    config
).unwrap();

// Parser will return StreamError::Timeout if parsing exceeds 10 seconds
for event in parser {
    // Process events...
}

Re-exports§

pub use compression::CompressionWriter;
pub use compression::CompressionFormat;
pub use compression::CompressionReader;

Modules§

compression
Compression support for streaming HEDL. Transparent compression support for streaming HEDL parsing.

Structs§

AsyncLineReader
Buffered async line reader with line number tracking.
AsyncStreamingParser
Async streaming HEDL parser.
BufferPool
Buffer pool for String and Vec<Value> reuse.
HeaderInfo
Header information parsed from the HEDL document.
LineReader
Buffered line reader with line number tracking.
MemoryLimits
Memory limits for buffer management.
NodeInfo
Information about a parsed node (entity/row).
Reference
Re-export core types for convenience. A reference to another node.
StreamingParser
Streaming HEDL parser.
StreamingParserConfig
Configuration options for the streaming parser.

Enums§

BufferSizeHint
Buffer size hints for different workload profiles.
NodeEvent
Event emitted by the streaming parser.
StreamError
Errors that can occur during streaming parsing.
Value
Re-export core types for convenience. A scalar value in HEDL.

Type Aliases§

StreamResult
Result type for streaming operations.