hedl-json 2.0.0

HEDL to/from JSON conversion
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
# hedl-json

**HEDL's integration with the JSON ecosystem -bidirectional conversion, JSONPath queries, schema generation, and streaming.**

JSON is the universal data interchange format. Your APIs speak it, your databases accept it, your monitoring tools consume it, your LLM providers require it. Every token in a JSON payload costs money. Every extra byte adds latency. Every API call compounds the inefficiency.

`hedl-json` bridges HEDL's efficiency with JSON's ubiquity. Use HEDL's compact matrix notation internally -save 46.7% on tokens, 57.7% on payload size. When you need JSON compatibility, `hedl-json` handles the conversion seamlessly. Query HEDL documents with JSONPath. Generate JSON Schema for validation. Stream large JSON files without loading everything into memory.

Part of the **HEDL format family** alongside `hedl-yaml`, `hedl-xml`, `hedl-csv`, and `hedl-parquet` -bringing HEDL's efficiency to every ecosystem you work in.

## What's Implemented

Based on 6,333 lines of Rust across 7 modules:

1. **Bidirectional Conversion**: HEDL ↔ JSON with configurable fidelity
2. **JSONPath Queries**: Query HEDL documents using standard JSONPath syntax
3. **JSON Schema Generation**: Generate JSON Schema Draft 7 from HEDL documents
4. **Streaming Parsers**: Process large JSON/JSONL files incrementally without full memory load
5. **Schema Caching**: LRU cache for repeated structure inference (30-50% speedup)
6. **Security Limits**: DoS protection with configurable resource limits

## Installation

```toml
[dependencies]
hedl-json = "2.0"
```

## Bidirectional Conversion

### HEDL → JSON: Export for APIs and LLMs

Convert HEDL's compact representation to JSON when you need API compatibility:

```rust
use hedl_json::{to_json, to_json_value, ToJsonConfig};

let doc = hedl_core::parse(br#"
%S:User:[id, name, email]
---
users: @User
 | alice, Alice Smith, alice@example.com
 | bob, Bob Jones, bob@example.com
"#)?;

// Configure JSON output
let config = ToJsonConfig {
    include_metadata: false,  // Don't add __type__, __schema__ fields
    flatten_lists: false,      // Keep matrix structure as object arrays
    include_children: true,    // Include nested entities
    ascii_safe: false,         // UTF-8 output (set true for ASCII-only)
};

// Convert to JSON string (for API responses)
let json_str = to_json(&doc, &config)?;
// {"users": [{"id": "alice", "name": "Alice Smith", "email": "alice@example.com"}, ...]}

// Or get serde_json::Value directly (for further processing)
let json_val = to_json_value(&doc, &config)?;
```

**Token Efficiency**: HEDL's matrix notation saves 46.7% tokens compared to verbose JSON arrays. Use HEDL internally, export to JSON only at system boundaries.

### JSON → HEDL: Import from APIs and Files

Parse JSON from external APIs into HEDL's structured data model:

```rust
use hedl_json::{from_json, from_json_value, from_json_value_owned, FromJsonConfig};

// From JSON string (e.g., API response)
let json = r#"{"name": "Alice", "age": 30, "active": true}"#;
let config = FromJsonConfig::default();
let doc = from_json(json, &config)?;

// From serde_json::Value (existing parsed JSON)
let value: serde_json::Value = serde_json::from_str(json)?;

// Borrows the value (value remains usable after conversion)
let doc = from_json_value(&value, &config)?;

// Or takes ownership for zero-copy efficiency
let doc = from_json_value_owned(value, &config)?;
```

## Security Limits: DoS Protection

`FromJsonConfig` enforces resource limits to prevent denial-of-service attacks from malicious JSON. Defaults are intentionally **high** for legitimate ML and data processing workloads:

```rust
use hedl_json::{from_json, FromJsonConfig};

// Default configuration (for trusted internal data)
let default = FromJsonConfig::default();
// max_depth: Some(10,000) levels (deep hierarchies, nested JSON)
// max_array_size: Some(10,000,000) elements (large datasets, batch processing)
// max_string_length: Some(100 MB) (embeddings, base64-encoded data)
// max_object_size: Some(100,000) keys (rich metadata, complex objects)

let json = r#"{"name": "Alice", "age": 30}"#;
let doc = from_json(json, &default)?;
```

For untrusted input (user uploads, external APIs, public endpoints), use stricter limits:

```rust
use hedl_json::{from_json, FromJsonConfig};

// Strict configuration (for untrusted external sources)
let strict = FromJsonConfig::builder()
    .max_depth(100)                        // 100 levels
    .max_array_size(10_000)                // 10K elements
    .max_string_length(1_000_000)          // 1 MB
    .max_object_size(1_000)                // 1K keys
    .build();

let json = r#"{"name": "Bob", "age": 25}"#;
let doc = from_json(json, &strict)?;
```

Exceeding limits returns `JsonConversionError` variants: `MaxDepthExceeded`, `MaxArraySizeExceeded`, `MaxStringLengthExceeded`, `MaxObjectSizeExceeded`.

## Schema Caching: 30-50% Speedup

When converting JSON arrays with repeated structure (common in API responses), `hedl-json` caches inferred schemas automatically:

```rust
use hedl_json::schema_cache::{SchemaCache, SchemaCacheKey};

let cache = SchemaCache::new(100);  // Capacity: 100 schemas

// Cache is used automatically during from_json() for uniform arrays
// Manual cache usage (for advanced control):
let key = SchemaCacheKey::new(vec!["id".to_string(), "name".to_string()]);
cache.insert(key.clone(), vec!["id".to_string(), "name".to_string()]);

if let Some(schema) = cache.get(&key) {
    // Hit: 30-50% faster than re-inferring schema
}

// Monitor cache performance
let stats = cache.statistics();
println!("Hit rate: {:.2}%", stats.hit_rate() * 100.0);
println!("Hits: {}, Misses: {}, Evictions: {}",
    stats.hits, stats.misses, stats.evictions);
```

For 1000-row JSON arrays with repeated structure, schema caching provides 30-50% speedup over naive inference.

## JSONPath Queries

Query HEDL documents using standard JSONPath syntax (powered by `serde_json_path`):

```rust
use hedl_json::jsonpath::{query, query_first, query_single, query_exists, query_count, QueryConfig};

let doc = hedl_core::parse(br#"
users: @User[id, name, age]
 | alice, Alice Smith, 30
 | bob, Bob Jones, 25
 | carol, Carol White, 35
"#)?;

let config = QueryConfig::default();

// Get all matches
let results = query(&doc, "$.users[?(@.age > 30)].name", &config)?;
// Returns: [serde_json::Value("Carol White")]

// Get first match (returns Option)
let first = query_first(&doc, "$.users[0].name", &config)?;
// Returns: Some(serde_json::Value("Alice Smith"))

// Get exactly one match (errors if 0 or multiple matches)
let single = query_single(&doc, "$.users[?(@.id == 'alice')].name", &config)?;
// Returns: serde_json::Value("Alice Smith")

// Check if any matches exist
let exists = query_exists(&doc, "$.users[?(@.age > 40)]", &config)?;
// Returns: false

// Count matches
let count = query_count(&doc, "$.users[*]", &config)?;
// Returns: 3
```

### QueryConfig Options

```rust
use hedl_json::jsonpath::{QueryConfig, QueryConfigBuilder};

let config = QueryConfig {
    include_metadata: false,   // Don't add __type__ fields in results
    flatten_lists: false,       // Keep matrix structure
    include_children: true,     // Include nested data
    max_results: 100,           // Limit results (0 = unlimited)
};

// Or use builder
let config = QueryConfigBuilder::new()
    .include_metadata(false)
    .max_results(50)
    .build();
```

## JSON Schema Generation

Generate JSON Schema Draft 7 from HEDL documents for validation and documentation:

```rust
use hedl_json::schema_gen::{generate_schema, generate_schema_value, SchemaConfig};

let doc = hedl_core::parse(br#"
%S:User:[id, name, email, age]
---
users: @User
 | u1, Alice, alice@example.com, 30
"#)?;

let config = SchemaConfig::builder()
    .title("User API Schema")
    .description("Schema for user data endpoint")
    .schema_id("https://api.example.com/schemas/user.json")
    .strict(true)              // disallow additionalProperties
    .include_examples(true)    // add example values from data
    .include_metadata(true)    // include title/description/$id
    .build();

// Generate as JSON string (for documentation)
let schema_json = generate_schema(&doc, &config)?;

// Or as serde_json::Value (for programmatic use)
let schema_value = generate_schema_value(&doc, &config)?;
```

### Smart Type Inference

The schema generator infers JSON Schema formats from actual data:

**Value-Based Inference** (analyzed during schema generation):

```rust
// Field values → JSON Schema format annotation
"alice@example.com"              → {"type": "string", "format": "email"}
"https://example.com"            → {"type": "string", "format": "uri"}
"2024-01-15T10:30:00Z"          → {"type": "string", "format": "date-time"}
"550e8400-e29b-41d4-a716-..."   → {"type": "string", "format": "uuid"}
```

**Name-Based Inference** (fallback when values are ambiguous):

```rust
// Field names → format hints
"email" field      → format: "email"
"url" field        → format: "uri"
"created_at" field → format: "date-time"
"uuid" field       → format: "uuid"
```

### %NEST Relationships in Schemas

HEDL's `%NEST` declarations become nested object arrays in JSON Schema:

```rust
let doc = hedl_core::parse(br#"
%S:Team:[id, name]
%S:Member:[id, name, role]
%N:Team>Member
---
teams: @Team
 | t1, Engineering
"#)?;

let schema = generate_schema_value(&doc, &SchemaConfig::default())?;
// Team schema includes:
// {
//   "type": "object",
//   "properties": {
//     "id": {"type": "string"},
//     "name": {"type": "string"},
//     "members": {
//       "type": "array",
//       "items": {"$ref": "#/definitions/Member"}
//     }
//   }
// }
```

## Streaming: Process Large JSON Without Full Memory Load

### JSON Array Streaming

Stream elements from large JSON arrays incrementally:

```rust
use hedl_json::streaming::{JsonArrayStreamer, StreamConfig};
use std::fs::File;

// Open large JSON file: [{...}, {...}, {...}, ...]
let file = File::open("large_dataset.json")?;
let config = StreamConfig::default();
let streamer = JsonArrayStreamer::new(file, config)?;

let mut count = 0;
for result in streamer {
    let doc = result?;  // Each array element as HEDL document
    count += 1;
    // Process document: validate, transform, aggregate
}
println!("Processed {} documents", count);
```

**Performance**: Streaming is 1.2-2.1x faster than loading the full array and parsing.

### JSONL (JSON Lines) Streaming

Stream JSONL files line-by-line with robust error handling:

```rust
use hedl_json::streaming::{JsonLinesStreamer, StreamConfig};
use std::fs::File;

let file = File::open("logs.jsonl")?;  // One JSON object per line
let config = StreamConfig::default();
let streamer = JsonLinesStreamer::new(file, config);

for result in streamer {
    match result {
        Ok(doc) => {
            // Process valid log entry
        }
        Err(e) => {
            // Malformed line - log error and continue
            eprintln!("Skipping malformed line {}: {}",
                streamer.line_number(), e);
        }
    }
}
```

**JSONL Features**:
- Blank lines: automatically skipped
- Comments: lines starting with `#` are ignored
- Robust: continues processing on invalid lines (errors returned per line)
- Line tracking: `line_number()` method for debugging

### JSONL Writing

Write HEDL documents as JSONL for streaming output:

```rust
use hedl_json::streaming::JsonLinesWriter;
use std::fs::File;

let file = File::create("output.jsonl")?;
let mut writer = JsonLinesWriter::new(file);

for doc in documents {
    writer.write_document(&doc)?;  // One document per line
}

writer.flush()?;  // Ensure all data written
```

### StreamConfig Options

```rust
use hedl_json::streaming::StreamConfig;
use hedl_json::FromJsonConfig;

let config = StreamConfig {
    buffer_size: 64 * 1024,                 // 64 KB buffer (default)
    max_object_bytes: Some(10 * 1024 * 1024), // 10 MB per object (default)
    from_json: FromJsonConfig::default(),   // Security limits per object
    use_size_estimation: true,              // Efficient size checks (default)
    true_streaming: true,                   // Constant memory for arrays (default)
};

// Or use builder
let config = StreamConfig::builder()
    .buffer_size(128 * 1024)                    // 128 KB buffer
    .max_object_bytes(50 * 1024 * 1024)         // 50 MB per object
    .unlimited_object_size()                    // Disable limit (use with caution)
    .from_json_config(FromJsonConfig::builder()
        .max_depth(100)
        .build())
    .use_size_estimation(true)                  // Efficient size checks
    .true_streaming(true)                       // Constant memory mode
    .build();
```

## Format Mapping

### HEDL → JSON

| HEDL Type | JSON Output | Example |
|-----------|-------------|---------|
| Scalars (null, bool, number, string) | Direct mapping | `null`, `true`, `42`, `"text"` |
| Objects | JSON objects | `{"key": "value"}` |
| Arrays (tensors) | JSON arrays | `[1, 2, 3]` |
| `@User:alice` (reference) | `{"@ref": "@User:alice"}` | Special object format |
| `$(x + 1)` (expression) | `"$(x + 1)"` | String with `$()` wrapper |
| Matrix lists | Arrays of objects | `[{"id": "a", "name": "Alice"}, ...]` |

Example matrix list conversion:

```hedl
users: @User[id, name]
 | alice, Alice
 | bob, Bob
```

Becomes:

```json
{
  "users": [
    {"id": "alice", "name": "Alice"},
    {"id": "bob", "name": "Bob"}
  ]
}
```

### JSON → HEDL

| JSON Type | HEDL Result | Notes |
|-----------|-------------|-------|
| Objects | HEDL objects | Nested structures preserved |
| Arrays | HEDL arrays | Uniform objects become matrix lists |
| `{"@ref": "..."}` | HEDL reference | Special format recognized |
| `"$(...)"` strings | HEDL expression | Pattern triggers expression parsing |
| Primitives | Direct mapping | Null, bool, number, string |

**Schema Inference**: Uniform object arrays are automatically converted to matrix lists with inferred schemas. Fields are sorted alphabetically with `id` first if present.

## Use Cases

**API Integration**: Receive JSON from external APIs, convert to HEDL for structured processing, export back to JSON for responses. Save 46.7% on token costs for LLM API calls.

**Data Pipelines**: Read JSON logs/events, process with HEDL's structured model, export to CSV (`hedl-csv`) or Parquet (`hedl-parquet`) for analytics.

**Configuration Management**: Store configs in HEDL with schema validation (`hedl-lint`), export to JSON for runtime consumption by existing tools.

**LLM Context Optimization**: Convert verbose JSON prompts to HEDL (46.7% token savings), send compact HEDL to LLM provider's API (after JSON conversion at the boundary).

**Schema Documentation**: Generate JSON Schema from HEDL documents for API documentation, OpenAPI specs, and validation tools.

**Log Processing**: Stream large JSONL log files, filter/transform with HEDL's query API, aggregate statistics without full memory load.

## What This Crate Doesn't Do

**Schema Preservation**: JSON has no schema concept. HEDL's `%STRUCT`, `%NEST`, `%ALIAS` declarations are lost in JSON conversion. If you need validation after round-tripping through JSON, redefine schemas explicitly in HEDL.

**Validation**: Converts formats faithfully, doesn't validate data against schemas. For schema validation, use `hedl-lint`.

**Optimization**: Converts structures as-is, not optimally. Verbose JSON becomes verbose HEDL. To leverage HEDL's matrix efficiency, restructure data into uniform arrays intentionally.

**True Array Streaming**: `JsonArrayStreamer` loads the entire JSON array into memory (limitation of `serde_json`). For true incremental processing, use `JsonLinesStreamer` with JSONL format.

## Dependencies

- `serde_json` 1.0 - JSON parsing and serialization
- `serde_json_path` 0.7 - JSONPath query engine
- `hedl-core` 2.0 - HEDL parsing and data model
- `thiserror` 1.0 - Error type definitions

## Performance Characteristics

**Conversion**: HEDL → JSON is serialization-bound. JSON → HEDL is parsing-bound.

**Caching**: Schema inference with caching provides 30-50% speedup for repeated structures in JSON arrays.

**Streaming**:
- JSONL processing is O(1) memory per object
- JSON array streaming loads full array (use JSONL for large files)
- Streaming is 1.2-2.1x faster than full parse for large datasets

**JSONPath**: Query performance depends on `serde_json_path` implementation. Queries execute on JSON representation (HEDL → JSON conversion happens first).

Detailed performance benchmarks are available in the HEDL repository benchmark suite.

## License

Apache-2.0