Expand description
Streaming HEDL Parser
This crate provides a streaming, memory-efficient parser for HEDL documents. Instead of loading the entire document into memory, it yields events or nodes one at a time, making it suitable for processing multi-GB files.
§Features
- Memory Efficient: Process files larger than available RAM
- Iterator-based: Standard Rust iterator interface (sync)
- Async Support: Non-blocking I/O with tokio (optional)
- Event-driven: Optional SAX-like event callbacks
- Timeout Protection: Prevent infinite loops from malicious/untrusted input
- Compatible: Works with
hedl-parquetandhedl-neo4jfor streaming export
§Sync vs Async
§Synchronous API (default)
Use the synchronous API for:
- Processing local files
- Single-threaded batch processing
- Simpler code without async complexity
- CPU-bound workloads with minimal I/O wait
use hedl_stream::{StreamingParser, NodeEvent};
use std::io::BufReader;
use std::fs::File;
let file = File::open("large-dataset.hedl").unwrap();
let reader = BufReader::new(file);
let parser = StreamingParser::new(reader).unwrap();
for event in parser {
match event {
Ok(NodeEvent::Node(node)) => {
println!("{}:{}", node.type_name, node.id);
}
Ok(NodeEvent::ListStart { type_name, .. }) => {
println!("List started: {}", type_name);
}
Err(e) => {
eprintln!("Error: {}", e);
break;
}
_ => {}
}
}§Asynchronous API (feature = “async”)
Use the asynchronous API for:
- Processing network streams or pipes
- High-concurrency scenarios (many parallel streams)
- Integration with async web servers or frameworks
- Non-blocking I/O in async runtime contexts
use hedl_stream::{AsyncStreamingParser, NodeEvent};
use tokio::fs::File;
let file = File::open("large-dataset.hedl").await?;
let mut parser = AsyncStreamingParser::new(file).await?;
while let Some(event) = parser.next_event().await? {
match event {
NodeEvent::Node(node) => {
println!("{}:{}", node.type_name, node.id);
}
NodeEvent::ListStart { type_name, .. } => {
println!("List started: {}", type_name);
}
_ => {}
}
}§Timeout Protection for Untrusted Input
When parsing untrusted input, configure a timeout to prevent infinite loops:
use hedl_stream::{StreamingParser, StreamingParserConfig};
use std::time::Duration;
use std::io::Cursor;
let config = StreamingParserConfig {
timeout: Some(Duration::from_secs(10)),
..Default::default()
};
let untrusted_input = "..."; // Input from untrusted source
let parser = StreamingParser::with_config(
Cursor::new(untrusted_input),
config
).unwrap();
// Parser will return StreamError::Timeout if parsing exceeds 10 seconds
for event in parser {
// Process events...
}Re-exports§
pub use compression::CompressionWriter;pub use compression::CompressionFormat;pub use compression::CompressionReader;
Modules§
- compression
- Compression support for streaming HEDL. Transparent compression support for streaming HEDL parsing.
Structs§
- Async
Line Reader - Buffered async line reader with line number tracking.
- Async
Streaming Parser - Async streaming HEDL parser.
- Buffer
Pool - Buffer pool for String and
Vec<Value>reuse. - Header
Info - Header information parsed from the HEDL document.
- Line
Reader - Buffered line reader with line number tracking.
- Memory
Limits - Memory limits for buffer management.
- Node
Info - Information about a parsed node (entity/row).
- Reference
- Re-export core types for convenience. A reference to another node.
- Streaming
Parser - Streaming HEDL parser.
- Streaming
Parser Config - Configuration options for the streaming parser.
Enums§
- Buffer
Size Hint - Buffer size hints for different workload profiles.
- Node
Event - Event emitted by the streaming parser.
- Stream
Error - Errors that can occur during streaming parsing.
- Value
- Re-export core types for convenience. A scalar value in HEDL.
Type Aliases§
- Stream
Result - Result type for streaming operations.