hedl-cli 1.2.0 - Docs.rs

# hedl-cli

**Complete HEDL toolkit—validation, formatting, linting, inspection, conversion, and batch processing with parallel execution.**

You need to validate HEDL files, convert between formats, analyze structure, or process hundreds of files in parallel. `hedl-cli` provides 21 commands covering the entire HEDL workflow: core operations (validate, format, lint, inspect, stats), bidirectional conversion to 6 formats (JSON, YAML, XML, CSV, Parquet, TOON), batch processing with automatic parallelization, and shell completion generation.

This is the official command-line interface for the HEDL ecosystem. Whether you're validating configuration files, converting database exports, analyzing token efficiency, or processing directories of HEDL documents—`hedl-cli` provides the tools you need.

## What's Implemented

Complete command-line toolkit with 21 commands across 4 categories:

1. **Core Commands (5)**: Validate, format, lint, inspect, stats
2. **Format Conversion (12)**: Bidirectional conversion for JSON, YAML, XML, CSV, Parquet, TOON
3. **Batch Processing (3)**: Parallel validation, formatting, linting with progress tracking
4. **Utilities (1)**: Shell completion generation for 5 shells
5. **High-Performance Architecture**: Parallel processing, colored output, structured errors
6. **Security Features**: File size limits (1 GB default), input validation, safe error propagation
7. **Flexible Output**: File or stdout, JSON or text, pretty or compact formatting

## Installation

```bash
# From source
cargo install hedl-cli

# Or build locally
cd crates/hedl-cli
cargo build --release
```

Binary location: `target/release/hedl`

## Core Commands

### validate - Syntax and Structure Validation

Validate HEDL files with optional strict reference checking:

```bash
# Basic validation
hedl validate config.hedl

# Strict mode (all references must resolve)
hedl validate --strict api_schema.hedl
```

**Output**:
```
✓ config.hedl
  Version: 1.0
  Structs: 5
  Aliases: 2
  Nests: 3
```

**Options**:
- `--strict` - Enforce all entity references must resolve to defined entities

**Exit Codes**: 0 (valid), 1 (parse errors or validation failures)

### format - Canonical Formatting

Normalize HEDL files to canonical form with optional optimizations:

```bash
# Format to stdout
hedl format data.hedl

# Format to file
hedl format data.hedl -o formatted.hedl

# Check if already canonical (no changes)
hedl format --check config.hedl

# Disable ditto optimization (keep repeated values explicit)
hedl format --ditto=false data.hedl

# Add count hints to all matrix lists
hedl format --with-counts users.hedl
```

**Options**:
- `-o, --output <FILE>` - Write to file instead of stdout
- `--check` - Only check if canonical, don't write output
- `--ditto` - Enable ditto operator optimization for repeated values (default: enabled)
- `--with-counts` - Recursively add count hints to matrix lists

**Exit Codes**: 0 (success or already canonical), 1 (parse error or check failed)

### lint - Best Practices Checking

Check HEDL files against best practices with configurable severity:

```bash
# Text output with colors
hedl lint schema.hedl

# JSON output for programmatic processing
hedl lint --format json config.hedl

# Treat warnings as errors
hedl lint --warn-error critical.hedl
```

**Output** (text format with colors):
```
Warning [unused-alias]: Alias 'old_api' is defined but never used
  at line 15

Suggestion [add-count-hints]: Matrix list 'users' is missing count hint
  at line 42

Found 1 warning, 1 suggestion
```

**Output** (JSON format):
```json
{
  "issues": [
    {
      "severity": "warning",
      "rule": "unused-alias",
      "message": "Alias 'old_api' is defined but never used",
      "line": 15
    }
  ],
  "summary": {
    "errors": 0,
    "warnings": 1,
    "suggestions": 1
  }
}
```

**Options**:
- `-f, --format <text|json>` - Output format (default: text with colors)
- `-W, --warn-error` - Treat warnings as errors

**Exit Codes**: 0 (no issues), 1 (has errors or warnings when --warn-error enabled)

### inspect - Structure Visualization

Display HEDL file structure as an interactive tree:

```bash
# Basic tree view
hedl inspect data.hedl

# Verbose mode (show field values and row data)
hedl inspect -v schema.hedl
```

**Output** (tree format):
```
Document (1.0)
├─ Schemas (3)
│  ├─ User [id, name, email, created_at]
│  ├─ Post [id, author, title, content]
│  └─ Comment [id, post, author, text]
├─ Aliases (2)
│  ├─ $api_url = "https://api.example.com"
│  └─ $version = "2.1.0"
├─ Nests (1)
│  └─ Post > Comment
└─ Data
   ├─ users: @User (125 entities)
   ├─ posts: @Post (48 entities)
   └─ comments: @Comment (312 entities)
```

**Verbose Output** (shows actual data):
```
└─ users: @User (3 entities)
   ├─ alice [Alice Smith, alice@example.com, 2024-01-15]
   ├─ bob [Bob Jones, bob@example.com, 2024-02-20]
   └─ carol [Carol White, carol@example.com, 2024-03-10]
```

**Options**:
- `-v, --verbose` - Show detailed field values and row data

### stats - Format Comparison Analysis

Compare HEDL file size and token counts vs JSON, YAML, XML:

```bash
# Byte counts only
hedl stats data.hedl

# Include LLM token estimates
hedl stats --tokens config.hedl
```

**Output**:
```
Format Comparison for 'data.hedl':

File Sizes:
  HEDL:         2,458 bytes
  JSON (compact): 3,841 bytes  (+56.3%)
  JSON (pretty):  5,219 bytes  (+112.3%)
  YAML:         4,105 bytes  (+67.0%)
  XML:          6,732 bytes  (+173.9%)

Token Estimates (LLM ~4 chars/token):
  HEDL:         615 tokens
  JSON (compact): 960 tokens   (+56.1%)
  JSON (pretty):  1,305 tokens (+112.2%)
  YAML:         1,026 tokens  (+66.8%)
  XML:          1,683 tokens  (+173.7%)

Conclusion: HEDL saves 345 tokens (36%) vs JSON compact, 690 tokens (53%) vs JSON pretty
```

**Options**:
- `--tokens` - Include LLM token count estimates (~4 chars/token heuristic)

**Performance**: All format conversions run in parallel using Rayon for maximum throughput.

## Format Conversion Commands

Bidirectional conversion between HEDL and 6 popular formats.

### JSON Conversion

```bash
# HEDL → JSON (compact)
hedl to-json data.hedl -o output.json

# HEDL → JSON (pretty-printed)
hedl to-json --pretty data.hedl

# HEDL → JSON (with metadata)
hedl to-json --metadata schema.hedl

# JSON → HEDL
hedl from-json input.json -o output.hedl
```

**to-json Options**:
- `-o, --output <FILE>` - Write to file
- `--pretty` - Pretty-print with indentation
- `--metadata` - Include HEDL version and schema information

### YAML Conversion

```bash
# HEDL → YAML
hedl to-yaml config.hedl -o config.yml

# YAML → HEDL
hedl from-yaml config.yml -o config.hedl
```

### XML Conversion

```bash
# HEDL → XML (compact)
hedl to-xml data.hedl -o output.xml

# HEDL → XML (pretty-printed)
hedl to-xml --pretty data.hedl

# XML → HEDL
hedl from-xml input.xml -o output.hedl
```

**to-xml Options**:
- `--pretty` - Pretty-print with indentation

### CSV Conversion

```bash
# HEDL → CSV (includes headers by default)
hedl to-csv users.hedl -o users.csv

# CSV → HEDL (specify entity type name)
hedl from-csv --type-name User input.csv -o users.hedl
```

**to-csv Options**:
- `--headers` - Include column headers (default: true)

**from-csv Options**:
- `--type-name <NAME>` - Entity type name (default: "Row")

### Parquet Conversion

```bash
# HEDL → Parquet (columnar format)
hedl to-parquet data.hedl --output output.parquet

# Parquet → HEDL
hedl from-parquet input.parquet -o data.hedl
```

Note: `--output` is required (not optional like other commands) because Parquet uses binary columnar format.

### TOON Conversion

```bash
# HEDL → TOON
hedl to-toon data.hedl -o output.toon

# TOON → HEDL
hedl from-toon input.toon -o data.hedl
```

TOON (Token-Oriented Object Notation) is optimized for LLM efficiency but accuracy testing shows HEDL achieves higher comprehension (+3.4 points average) with 10% fewer tokens.

## Batch Processing Commands

Process multiple files in parallel with automatic parallelization and progress tracking.

### batch-validate - Parallel Validation

```bash
# Validate all .hedl files in directory
hedl batch-validate data/*.hedl

# Strict mode for all files
hedl batch-validate --strict schemas/*.hedl

# Verbose progress tracking
hedl batch-validate -v configs/*.hedl

# Force parallel processing
hedl batch-validate -p data/*.hedl

# Use streaming mode for large files (constant memory)
hedl batch-validate --streaming large-files/*.hedl

# Automatically use streaming for files > 100MB
hedl batch-validate --auto-streaming mixed-files/*.hedl

# Limit processing to 5000 files
hedl batch-validate --max-files 5000 huge-directory/*.hedl
```

**Options**:
- `--strict` - Enforce reference resolution for all files
- `-v, --verbose` - Detailed progress output
- `-p, --parallel` - Force parallel processing (default: auto-detect based on file count)
- `--streaming` - Use streaming mode for memory-efficient processing (constant memory, ideal for files >100MB)
- `--auto-streaming` - Automatically use streaming for large files (>100MB) and standard mode for smaller files
- `--max-files <N>` - Maximum number of files to process (default: 10,000, set to 0 for unlimited)

**Output**:
```
Validating 127 files...
Progress: [========================================] 127/127 (100%)
Completed in 2.3s (55 files/sec)

Results:
  Valid: 125 files
  Failed: 2 files
    - data/broken.hedl: Parse error at line 42: unexpected token
    - schemas/old.hedl: Unresolved reference @User:nonexistent
```

**Performance**: Automatic parallelization when file count ≥ 10 (configurable), ~3-5x speedup on multi-core systems.

### batch-format - Parallel Formatting

```bash
# Format all files in-place
hedl batch-format data/*.hedl

# Format to output directory
hedl batch-format configs/*.hedl --output-dir formatted/

# Format with ditto optimization
hedl batch-format --ditto data/*.hedl

# Add count hints to all files
hedl batch-format --with-counts schemas/*.hedl

# Limit processing to 5000 files
hedl batch-format --max-files 5000 huge-directory/*.hedl
```

**Options**:
- `--output-dir <DIR>` - Write formatted files to directory (preserves relative paths)
- `--ditto` - Enable ditto optimization (default: true)
- `--with-counts` - Add count hints to matrix lists
- `-v, --verbose` - Detailed progress output
- `-p, --parallel` - Force parallel processing
- `--max-files <N>` - Maximum number of files to process (default: 10,000, set to 0 for unlimited)

**Output**:
```
Formatting 89 files...
Progress: [========================================] 89/89 (100%)
Completed in 1.8s (49 files/sec)

Results:
  Formatted: 87 files
  Unchanged: 0 files (already canonical)
  Failed: 2 files
    - data/corrupt.hedl: Parse error at line 15
```

### batch-lint - Parallel Linting

```bash
# Lint all files with aggregated results
hedl batch-lint data/*.hedl

# Treat warnings as errors
hedl batch-lint --warn-error schemas/*.hedl

# Verbose per-file results
hedl batch-lint -v configs/*.hedl

# Limit processing to 5000 files
hedl batch-lint --max-files 5000 huge-directory/*.hedl
```

**Options**:
- `--warn-error` - Treat warnings as errors
- `-v, --verbose` - Show issues for each file
- `-p, --parallel` - Force parallel processing
- `--max-files <N>` - Maximum number of files to process (default: 10,000, set to 0 for unlimited)

**Output**:
```
Linting 64 files...
Progress: [========================================] 64/64 (100%)
Completed in 1.1s (58 files/sec)

Aggregated Results:
  Errors: 3 across 2 files
  Warnings: 12 across 8 files
  Suggestions: 25 across 19 files

Top Issues:
  - unused-alias (8 occurrences)
  - add-count-hints (7 occurrences)
  - unresolved-reference (3 occurrences)

Failed Files:
  - schemas/old.hedl: 2 errors, 3 warnings
  - configs/broken.hedl: 1 error
```

## Shell Completion

Generate shell completion scripts for interactive usage:

```bash
# Generate for current shell
hedl completion bash > ~/.hedl-completion.bash
hedl completion zsh > ~/.hedl-completion.zsh
hedl completion fish > ~/.config/fish/completions/hedl.fish

# Supported shells
hedl completion bash      # Bash
hedl completion zsh       # Zsh
hedl completion fish      # Fish
hedl completion powershell # PowerShell
hedl completion elvish    # Elvish
```

**Installation** (bash example):
```bash
# Add to ~/.bashrc
source ~/.hedl-completion.bash
```

After installation, tab completion works for all commands, subcommands, and options:
```bash
hedl <TAB>           # Shows all commands
hedl batch-<TAB>     # Shows batch-validate, batch-format, batch-lint
hedl validate --<TAB> # Shows --strict option
```

## Security Features

### File Size Limits

Prevents memory exhaustion from malicious or unexpected large files:

```bash
# Default: 1 GB limit
hedl validate huge_file.hedl
# Error: File size (1.2 GB) exceeds limit (1 GB)

# Configure via environment variable
export HEDL_MAX_FILE_SIZE=2147483648  # 2 GB
hedl validate huge_file.hedl
```

**Default Limit**: 1,073,741,824 bytes (1 GB)

### Input Validation

- **Type Names** (CSV conversion): Alphanumeric characters and underscores only
- **Path Safety**: All file operations validated before processing
- **Error Boundaries**: Continues batch processing on individual file errors

### Error Context

All errors include file paths and detailed context:

```
Error: Failed to parse 'data/broken.hedl'
  Parse error at line 42, column 15:
    unexpected token ']', expected field name

  Context:
    40 | users: @User[id, name]
    41 |   | alice, Alice Smith
    42 |   | bob, Bob Jones]
       |                     ^ here
```

## Architecture Features

### Parallel Processing

**BatchProcessor System**:
- Configurable parallelization threshold (default: 10 files)
- Automatic thread pool sizing
- Progress tracking with atomic counters (lock-free)
- Error resilience (collects all failures, continues processing)

**Performance**: ~3-5x speedup on multi-core systems for batch operations.

### Count Hints System

Recursively adds count hints to matrix lists and nested children:

```hedl
# Before formatting with --with-counts
users: @User[id, name]
  | alice, Alice
  | bob, Bob

# After formatting
users[2]: @User[id, name]
  | alice, Alice
  | bob, Bob
```

**Behavior**: Overwrites existing hints with actual counts from parsed document.

### Output Handling

- **Colored Console**: Uses `colored` crate for syntax highlighting and progress
- **Flexible Destinations**: File path or stdout (respects --output/-o)
- **Format Options**: JSON, text, compact, pretty-printed
- **Error Separation**: Errors always printed to stderr, output to stdout

### Error Types

Comprehensive error handling with 19 error variants:

- **Io** - File I/O errors with path context
- **FileTooLarge** - Size limit exceeded (configurable)
- **IoTimeout** - I/O operation timeout
- **Parse** - HEDL syntax errors with line/column
- **Canonicalization** - Canonicalization failures
- **JsonConversion** - JSON conversion errors
- **JsonFormat** - JSON serialization/deserialization errors
- **YamlConversion** - YAML conversion errors
- **XmlConversion** - XML conversion errors
- **CsvConversion** - CSV conversion errors
- **ParquetConversion** - Parquet conversion errors
- **LintErrors** - Linting errors found
- **NotCanonical** - File is not in canonical form
- **InvalidInput** - Input validation failures (type names, paths)
- **ThreadPoolError** - Parallel processing thread pool creation failure
- **GlobPattern** - Invalid glob pattern syntax
- **NoFilesMatched** - No files matched the provided patterns
- **DirectoryTraversal** - Directory traversal failures
- **ResourceExhaustion** - System resource exhaustion (file handles, memory)

All errors implement `std::error::Error`, `Display`, and `Clone` for detailed messages and parallel error handling.

## Use Cases

**Configuration Management**: Validate and lint HEDL configuration files in CI/CD pipelines, format for canonical diffs, convert to JSON/YAML for runtime.

**Data Pipeline Integration**: Convert CSV exports to HEDL for structured processing, validate schemas, transform data, export to Parquet for analytics.

**Schema Development**: Write HEDL schemas with instant validation feedback, lint for best practices, inspect structure, compare token efficiency vs JSON.

**Batch Processing**: Process directories of HEDL files in parallel (validation, formatting, linting), aggregate results, identify issues across large codebases.

**LLM Context Optimization**: Analyze token counts with `stats` command, convert JSON to HEDL for 40-60% token savings, validate compressed output.

**Database Export/Import**: Export databases to CSV, convert to HEDL with type inference, validate structure, transform with matrix operations, import to Neo4j via Cypher.

## What This Crate Doesn't Do

**Interactive Editing**: Not a REPL or interactive editor—use `hedl-lsp` with your favorite editor (VS Code, Neovim, Emacs) for interactive development.

**Language Server**: LSP functionality is in `hedl-lsp` crate—this CLI focuses on batch operations and one-off conversions.

**MCP Server**: Model Context Protocol server is in `hedl-mcp` crate—this CLI is for human-driven workflows and automation scripts.

**Data Transformation**: Provides format conversion and validation, not arbitrary data transformations—use HEDL's matrix query capabilities or convert to SQL/Cypher for complex transformations.

## Performance Characteristics

**Command Performance**:
- **validate**: O(n) parsing, ~100-200 MB/s throughput
- **format**: O(n) parse + canonicalization, ~50-100 MB/s
- **lint**: O(n) parse + validation rules, ~80-150 MB/s
- **stats**: Parallel format conversions, ~50-100 MB/s per format

**Batch Processing**: ~3-5x speedup with parallel execution on multi-core systems.

**Memory**: O(document_size) per file—loads entire document for parsing. For streaming large files (>100 MB), use `hedl-stream` crate directly.

Detailed performance benchmarks are available in the HEDL repository benchmark suite.

## Dependencies

- `hedl-core` 1.2 - HEDL parsing and data model
- `hedl-c14n` 1.2 - Canonicalization
- `hedl-lint` 1.2 - Best practices linting
- `hedl-json`, `hedl-yaml`, `hedl-xml`, `hedl-csv`, `hedl-parquet`, `hedl-toon` 1.2 - Format conversion
- `clap` 4.4 - CLI argument parsing
- `clap_complete` - Shell completion generation
- `colored` - Terminal coloring
- `rayon` - Parallel processing
- `serde_json` - JSON output formatting
- `thiserror` - Error type definitions

## License

Apache-2.0