hedl-xml
HEDL's XML ecosystem integration—bidirectional conversion, XSD schema validation, streaming, and async I/O.
XML powers enterprise systems: SOAP APIs, configuration files, data interchange across legacy platforms, regulatory compliance documents. Your infrastructure depends on it. Your vendors require it. But XML's verbosity and lack of type safety create friction.
hedl-xml bridges HEDL's structured data model with XML's ubiquity. Convert between formats with configurable fidelity. Validate against XSD schemas with detailed error messages. Stream multi-gigabyte files without loading everything into memory. Use async I/O for concurrent processing with Tokio.
Part of the HEDL format family alongside hedl-json, hedl-yaml, hedl-csv, and hedl-parquet—bringing HEDL's efficiency and structure to every ecosystem you work in.
What's Implemented
Based on 6,068 lines of Rust across 7 modules:
- Bidirectional Conversion: HEDL ↔ XML with configurable formatting
- XSD Schema Validation: Full XSD 1.0 validation with comprehensive error messages
- Schema Caching: Thread-safe LRU cache for high-performance repeated validation
- Streaming Parser: Process multi-gigabyte XML files with O(1) memory per element
- Async I/O: Tokio-based async operations for concurrent processing (feature-gated)
- Security: XXE prevention with entity policies, configurable recursion depth limits, and batch size controls
Installation
[]
= "1.2"
# For async I/O support:
= { = "1.2", = ["async"] }
= { = "1", = ["full"] }
Bidirectional Conversion
HEDL → XML: Export for Legacy Systems
Convert HEDL documents to XML when you need compatibility with existing enterprise systems:
use ;
let doc = parse?;
// Configure XML output
let config = ToXmlConfig ;
let xml = to_xml?;
Generated XML (3-5x larger than HEDL):
alice
Alice Smith
alice@example.com
bob
Bob Jones
bob@example.com
Size Overhead: XML is typically 3-5x larger than HEDL due to verbose tag syntax. Use XML only at system boundaries where compatibility is required.
XML → HEDL: Import from Enterprise Systems
Parse XML from SOAP APIs, configuration files, or data exports:
use ;
let xml = r#"<?xml version="1.0"?>
<system>
<database>
<host>localhost</host>
<port>5432</port>
<credentials>
<username>admin</username>
<password>secret</password>
</credentials>
</database>
<replicas>3</replicas>
</system>"#;
let config = FromXmlConfig ;
let hedl_doc = from_xml?;
// Now use HEDL's structured API for querying, validation, transformation
List Inference: When infer_lists: true, repeated XML elements like <user>...<user>... automatically become HEDL matrix lists.
XSD Schema Validation
Validate XML documents against XSD schemas with detailed, actionable error messages:
use SchemaValidator;
let schema_xsd = r#"<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
<xs:element name="email" type="xs:string"/>
</xs:sequence>
<xs:attribute name="id" type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
</xs:schema>"#;
let validator = from_xsd?;
// Validate XML document
let xml = r#"<?xml version="1.0"?>
<person id="p1">
<name>Alice</name>
<age>30</age>
<email>alice@example.com</email>
</person>"#;
validator.validate?; // Returns Ok(()) if valid
Schema Validation Features
Comprehensive Validation:
- Element structure validation (sequence, choice, all)
- Type validation (xs:string, xs:integer, xs:decimal, xs:boolean, custom types)
- Attribute validation (required, optional, fixed, default)
- Cardinality validation (minOccurs, maxOccurs, including unbounded)
- Namespace support (multiple namespaces, imports)
Detailed Error Messages with line numbers:
// Invalid XML - age is not an integer
let xml = r#"<?xml version="1.0"?>
<person id="p1">
<name>Alice</name>
<age>thirty</age>
<email>alice@example.com</email>
</person>"#;
let result = validator.validate;
// Error: "Type validation failed for 'age': expected xs:integer, found 'thirty' at line 4"
Schema Caching: High-Performance Validation
For repeated validation operations, use the thread-safe LRU schema cache:
use SchemaCache;
use Path;
// Create cache with capacity for 100 schemas
let cache = new;
// First load: parses and caches schema
let validator = cache.get_or_load?;
validator.validate?;
// Subsequent loads: uses cached validator (no re-parsing)
let validator2 = cache.get_or_load?;
validator2.validate?;
// Monitor cache performance
println!;
Performance: Schema caching eliminates parsing overhead for repeated validations. Use in high-throughput services processing thousands of XML documents.
Streaming: Process Multi-Gigabyte Files
For large XML files (hundreds of MB to several GB), use the streaming parser to process elements incrementally without loading the entire document into memory:
use ;
use File;
// Open large XML file (e.g., 5 GB database export)
let file = open?;
let config = StreamConfig ;
let mut count = 0;
for result in from_xml_stream?
println!;
Memory Usage: O(1) per element. A 5 GB XML file uses the same memory as a 5 MB file. Only the current element and buffer are in memory.
Streaming vs Buffered: Use streaming for files >100 MB. For smaller files, use from_xml() for simpler code.
Async I/O with Tokio
Enable async support for non-blocking I/O and concurrent processing (requires async feature):
use ;
use ;
async
Concurrent Batch Processing
Process multiple XML files concurrently with automatic concurrency limiting:
use from_xml_files_concurrent;
use FromXmlConfig;
async
Async Streaming for Large Files
Combine streaming with async I/O for maximum throughput:
use from_xml_stream_async;
use StreamConfig;
use File;
async
Security Limits: DoS Protection
hedl-xml enforces resource limits to prevent denial-of-service attacks from malicious XML files:
Recursion Depth Limit
Default: 100 levels
Configurable: Yes, via StreamConfig::max_recursion_depth (streaming API). Standard from_xml() uses fixed limit.
Protection: Prevents stack overflow from deeply nested XML structures
<!-- Malicious XML with 1000+ nested levels -->
... (1000 levels deep) ...
Error: XML recursion depth exceeded (max: 100, found: 101)
Batch Size Limit (Streaming)
Default: 1,000 elements per batch
Configurable: Yes, via StreamConfig::max_batch_size
Protection: Controls memory usage when processing repeated elements in streams
For the standard (non-streaming) from_xml() and to_xml() APIs, limits are hardcoded and cannot be adjusted. Use the streaming API if you need custom batch size limits.
Example with custom recursion limit:
use StreamConfig;
let config = StreamConfig ;
Note on String and List Size Limits: The error types support reporting string length and list size violations, but the actual limits are enforced at the underlying quick-xml parser level (no individual XML element can exceed XML parser limits). These are not currently user-configurable in hedl-xml.
Format Mapping
HEDL → XML
| HEDL Type | XML Output | Notes |
|---|---|---|
| Scalars (null, bool, number, string) | Element with text content | <val>42</val> |
| Objects | Nested elements | <config><name>test</name></config> |
| Arrays (tensors) | <item> elements |
<tensor><item>1</item><item>2</item></tensor> |
References (@User:alice) |
Element with __hedl_type__="ref" attribute |
Distinguishes from strings starting with @ |
Expressions ($(x + 1)) |
Element with $() wrapped text |
<expr>$(x + 1)</expr> |
| Matrix lists | Repeated elements | <user>...<user>... (singularized type name) |
XML → HEDL
| XML Pattern | HEDL Result | Notes |
|---|---|---|
| Elements with text | HEDL scalars | Type inference: "true" → Bool, "42" → Int, "3.14" → Float |
| Nested elements | HEDL objects | Hierarchical structure preserved |
| Repeated elements | HEDL matrix lists | When infer_lists: true |
Element with __hedl_type__="ref" |
HEDL reference | @Type:id format |
Text matching $(...) pattern |
HEDL expression | Parsed as computed value |
| Attributes | Object fields | <item id="1"/> → {"id": 1} |
Key Conversion: XML element names are converted to snake_case for HEDL compatibility: UserPost → user_post, XMLData → xmldata.
Use Cases
SOAP API Integration: Parse SOAP XML responses into HEDL for structured querying. Generate SOAP XML requests from HEDL templates with validation.
Configuration Migration: Convert XML config files (Spring, Tomcat, etc.) to HEDL for LSP-assisted editing with validation. Export back to XML for runtime.
Data Export/Import: Stream large XML database exports into HEDL for transformation. Export HEDL to XML for compatibility with legacy ETL tools.
Schema-First Development: Define data contracts as XSD schemas. Validate XML payloads in real-time with detailed error reporting. Convert to HEDL for processing.
Regulatory Compliance: Parse XML from compliance systems (banking, healthcare, government). Validate against regulatory XSD schemas. Transform with HEDL's structured API.
Multi-Format Pipelines: Read XML from SOAP APIs, convert to HEDL, combine with JSON from REST APIs (hedl-json), export to CSV for reporting (hedl-csv)—all through HEDL's unified data model.
What This Crate Doesn't Do
Schema Preservation: XML doesn't preserve HEDL's %STRUCT, %NEST, %ALIAS declarations (they're HEDL-specific). If you need schemas after round-tripping through XML, use XSD for validation or redefine HEDL schemas.
Validation: Converts formats, doesn't validate data. For HEDL schema validation, use hedl-lint. For XML schema validation, use SchemaValidator with XSD.
Optimization: Converts faithfully, not optimally. Verbose XML becomes verbose HEDL (3-5x size overhead). XML is inherently verbose—HEDL's efficiency comes from avoiding XML in the first place.
XML Comments: XML comments are discarded during parsing (standard XML processing behavior). Use HEDL comments in source .hedl files for preserved documentation.
Dependencies
quick-xml0.31 - High-performance XML parsing and serializationroxmltree0.20 - XSD schema parsing and validationhedl-core1.0 - HEDL parsing and data modelparking_lot0.12 - High-performance RwLock for schema cachetokio1.0 (optional) - Async I/O runtime (requiresasyncfeature)thiserror1.0 - Error type definitions
Performance Characteristics
Conversion Speed: HEDL → XML is serialization-bound (~50-100 MB/s). XML → HEDL is parsing-bound (~100-200 MB/s depending on complexity).
Schema Validation: XSD validation adds ~10-20% overhead vs parse-only. Schema caching eliminates re-parsing overhead for repeated validations.
Streaming: O(1) memory per element regardless of file size. Process 10 GB files with 100 MB RAM. Throughput: ~50-100 MB/s depending on element complexity.
Async I/O: Concurrent file processing scales linearly up to CPU core count. Use for I/O-bound workloads (network file systems, slow disks).
Detailed performance benchmarks are available in the HEDL repository benchmark suite.
License
Apache-2.0