hedl-core

The parsing engine that makes HEDL work.

Every format needs a foundation. JSON has parsers that handle {} and []. YAML wrestles with indentation. XML navigates angle brackets. HEDL's foundation is hedl-core - a parser designed from scratch for AI-era data: typed matrices, entity references, and tensor literals built into the format itself.

Why hedl-core Exists

When you're processing 100GB datasets for ML training, or serving thousands of API requests per second, parser performance isn't academic—it's operational. hedl-core was built for this: deterministic parsing with fail-fast error handling, zero-copy preprocessing for minimal allocations, and a data model that maps directly to how AI systems think about data.

What Makes It Different

Schema-Aware Matrices: Most formats see tabular data as "arrays of objects." HEDL sees them as typed matrices with column definitions—the difference between [{}, {}, {}] and @User[id,name,email]. The parser understands this natively.

Entity References: When you write @User:alice, the parser knows it's a reference, not a string. Graph relationships are first-class

citizens, not workarounds.

Zero-Copy Where It Counts: Line offset tables and careful memory layout mean the parser allocates only what it needs. Less allocation pressure means more predictable performance.

Fail-Fast Philosophy: Invalid syntax? You get an error at parse time with the exact line number. No silent failures, no "undefined" cascading through your system.

Installation

[dependencies]
hedl-core = "1.2"

Usage

Parse HEDL documents into a structured data model:

use hedl_core::{parse, Document, Value};

let hedl = r#"
%VERSION: 1.0
%STRUCT: User: [id, name, email, created]
---
users: @User
  | alice, Alice Smith, alice@example.com, 2024-01-15
  | bob, Bob Jones, bob@example.com, 2024-02-20
"#;

let doc = parse(hedl.as_bytes())?;

// The data model preserves structure
if let Some(users) = doc.get("users") {
    if let Some(matrix) = users.as_list() {
        println!("Schema: {:?}", matrix.schema);
        println!("Rows: {}", matrix.rows.len());
    }
}

Work with references and relationships:

let hedl = r#"
%VERSION: 1.0
%STRUCT: Post: [id, author, title]
%STRUCT: Comment: [id, post, author, text]
---
posts: @Post
  | p1, @User:alice, First Post
  | p2, @User:bob, Second Post

comments: @Comment
  | c1, @Post:p1, @User:bob, Great post!
  | c2, @Post:p1, @User:alice, Thanks!
"#;

let doc = parse(hedl.as_bytes())?;

// References are parsed as Reference values, not strings
// Traverse the graph structure naturally

Handle tensors for ML workflows:

let hedl = r#"
%VERSION: 1.0
---
metrics: @Metric[id, values, percentiles]
  | m1, 1250, [850, 1100, 1400, 2100, 3500]
  | m2, 890, [620, 780, 920, 1200, 1800]
"#;

let doc = parse(hedl.as_bytes())?;

// Tensor literals are native Value::Tensor types
// No string parsing needed

Data Model

The parsed document exposes a clean, typed API:

Document - Root container with version header and content map
Item - Body entries: Scalar (Value), Object (nested map), or List (MatrixList)
Node - A row in a matrix list with typed fields and optional children
Value - Tagged enum for scalar values: Null, Bool, Int, Float, String, Tensor, Reference, Expression
MatrixList - Schema-defined tabular data with typed columns
Reference - Typed entity pointers (@Type:id syntax)

Features

Deterministic Parsing: Same input always produces the same AST. No locale-dependent behavior, no timestamp drift, no platform quirks.

Zero-Copy Preprocessing: Line offset tables let the parser jump to any line without scanning the entire file. Parse the schema header, skip to the data section—no wasted work.

Type Safety: The data model uses Rust enums, not stringly-typed polymorphism. Value::Reference is distinct from Value::Scalar. Pattern matching gives you compile-time guarantees.

Error Messages That Help: Parse errors include line numbers, column positions, and context. No cryptic "unexpected token" messages.

Extensible Design: The core parser handles syntax. Higher-level crates add semantic validation, reference resolution, and format conversion.

What hedl-core Doesn't Do

It's a parser, not a Swiss Army knife:

No validation: Schema compliance checking is in hedl-lint
No resolution: Cross-reference verification is in hedl-core's validation layer
No conversion: Format translation is in hedl-json, hedl-yaml, etc.
No I/O: Streaming and network protocols are in hedl-stream

This separation means you can parse a 10GB file without loading validation rules, or validate a document without network-dependent resolution.

Performance Characteristics

Memory: Proportional to document structure, not text size. A 1MB file with 1000 rows allocates for 1000 matrix entries, not 1,000,000 characters.
Speed: Parsing is I/O-bound for most documents. The bottleneck is reading bytes, not processing them.
Allocation: Zero-copy line tables and string interning reduce heap pressure.

When to Use hedl-core Directly

Most developers use higher-level crates (hedl facade crate, hedl-cli tool, hedl-lsp editor support). You'd use hedl-core directly when:

Building custom tooling: Format converters, linters, analysis tools
Embedding HEDL parsing: In a larger application that needs low-level control
Performance-critical paths: When you need zero-abstraction parsing

If you're just converting formats or validating files, use hedl-cli. If you need autocomplete in your editor, use hedl-lsp. If you're building the next HEDL tool, start here.

Safety and Security

Memory Safety

hedl-core uses minimal unsafe code only in performance-critical paths (string interning and arena allocation). The parser provides:

Type safety through Rust's enum system (no stringly-typed polymorphism)
Bounds checking on all array/vector access
Guaranteed memory safety for the public API
No panics in library code (returns Result instead)

Unsafe Code

Unsafe blocks appear only in two places, both extensively audited:

String interning arena (lex/arena/interner.rs): Uses unsafe to create interned string references with manual lifetime management
Arena vector (lex/arena/vec.rs): Uses unsafe to create slice views over arena-allocated memory

These are local to the arena module, not exposed in the public API. The safe public API wraps all arena operations.

Error Handling

Parse errors never panic. Invalid input returns Result<Document, HedlError> with detailed diagnostic information:

use hedl_core::{parse, HedlError, HedlErrorKind};

match parse(data) {
    Ok(doc) => { /* process doc */ },
    Err(e) => {
        println!("Parse error at line {}: {}", e.line, e.message);
        // Error kinds: Syntax, Version, Schema, Alias, Shape, Semantic,
        // OrphanRow, Collision, Reference, Security, Conversion, IO
        match e.kind {
            HedlErrorKind::Reference => { /* handle unresolved reference */ },
            HedlErrorKind::Schema => { /* handle schema mismatch */ },
            _ => { /* other errors */ }
        }
    }
}

Reporting Security Issues

If you discover a security vulnerability, please email [security contact] with:

Description of the issue
Steps to reproduce
Potential impact
Suggested fix (if any)

We take security seriously and will respond within 48 hours.

License

Apache-2.0 — Use it, fork it, build on it.

hedl-core 1.2.0