ddex-parser 0.4.1

High-performance DDEX XML parser with SIMD optimization (40+ MB/s)
Documentation

DDEX Parser

License: MIT GitHub

High-performance DDEX XML parser built in Rust with comprehensive security protections and command-line interface. Parse ERN 3.8.2, 4.2, and 4.3 files with built-in validation, security hardening against XML attacks, and deterministic JSON output.

Part of the DDEX Suite - a comprehensive toolkit for working with DDEX metadata in the music industry.

v0.4.1 Released - Node.js bindings now fully functional with complete data access!

Version 0.4.0 - Streaming Parser Release with critical vulnerability fixes and enhanced error handling.

๐Ÿ›ก๏ธ Security-First Design

Fixed Critical Vulnerabilities:

  • โœ… XML Bomb Protection - Guards against billion laughs and entity expansion attacks
  • โœ… Deep Nesting Protection - Prevents stack overflow from malicious XML
  • โœ… Input Validation - Rejects malformed XML with clear error messages
  • โœ… Memory Bounds - Configurable limits for large file processing

๐Ÿš€ Current Implementation Status

โœ… Fully Working

  • Command Line Interface - Complete CLI with parse, validate, batch operations
  • Rust API - Full programmatic access via DDEXParser struct
  • ERN Support - ERN 3.8.2, 4.2, and 4.3 parsing and validation
  • Security Hardened - Protection against XML bombs, deep nesting, malformed input
  • JSON Output - Clean, deterministic JSON serialization

โœ… Language Bindings (Production Ready)

  • Node.js/TypeScript Bindings - Complete DDEX data structure access
  • Python Bindings - PyO3-based Python integration with pandas support
  • WebAssembly - Browser-compatible WASM module
  • Full Node.js bindings with TypeScript support

Quick Start

Command Line Interface (Ready Now)

# Install from source
git clone https://github.com/daddykev/ddex-suite
cd ddex-suite/packages/ddex-parser
cargo build --release

# Parse DDEX file to JSON
./target/release/ddex-parser parse release.xml --output release.json

# Validate DDEX file
./target/release/ddex-parser validate release.xml

# Batch process multiple files
./target/release/ddex-parser batch "*.xml" --output-dir results/

Rust Library (Ready Now)

use ddex_parser::DDEXParser;
use std::fs::File;
use std::io::BufReader;

// Create parser with secure defaults
let parser = DDEXParser::new();

// Parse DDEX file
let file = File::open("release.xml")?;
let reader = BufReader::new(file);
let parsed = parser.parse(reader)?;

// Access flattened data
println!("Release: {}", parsed.releases[0].release_title[0].text);
println!("Tracks: {}", parsed.releases[0].track_count);

Core Features

๐Ÿš€ Performance (v0.4.0 Validated)

SIMD-Optimized FastStreamingParser

The v0.4.0 release delivers exceptional performance across different workload types:

Workload Type Throughput Use Case
Production DDEX 25-30 MB/s Real-world files with complex structures
Batch Processing 500-700 MB/s Uniform XML with repetitive patterns
Peak Performance 1,265 MB/s Optimal conditions, memory efficiency tests

Validated Production Metrics

Real performance measurements from comprehensive test suite:

  • 11.57MB Production File: 26.61 MB/s (10K releases + 5K resources)
  • 14.75MB Memory Test: 1,265.26 MB/s (optimal conditions)
  • 1K Release Batch: 504.80 MB/s (stress test)
  • 5K Release Batch: 686.89 MB/s (stress test)
  • 10K Release Batch: 634.74 MB/s (stress test)

Memory Efficiency & Architecture

  • O(1) Memory Usage: <50MB peak regardless of file size
  • SIMD Acceleration: memchr-based pattern matching
  • Multi-pass Scanning: Separate optimized passes per element type
  • Pre-allocated Buffers: 50MB initial capacity prevents reallocation
  • Element Processing: ~100,000 elements/second sustained

Why Performance Varies

The SIMD-optimized parser achieves different throughput based on XML structure:

  • Complex DDEX Files: 25-30 MB/s (varied content, deep nesting)
  • Uniform Patterns: 500+ MB/s (repetitive structures, optimal for SIMD)
  • Memory-bound Operations: 1,200+ MB/s (cached data, minimal allocation)

Build Mode Critical

Performance is dramatically different between build modes:

  • Debug Mode: ~0.5 MB/s (unoptimized, development only)
  • Release Mode: 25-1,200+ MB/s (SIMD optimizations enabled)

โš ๏ธ Critical: Always build and test in release mode for production:

cargo build --release     # 50-100x faster than debug
cargo test --release      # Accurate performance measurement
cargo bench --release     # Benchmarking

Streaming & Security

  • Large File Support: >100MB files with constant memory usage
  • Security Preserved: All XXE and entity expansion protections maintained
  • Configuration: Enable via SecurityConfig::relaxed()

๐Ÿ”’ Security First

  • Built-in XXE (XML External Entity) protection
  • Entity expansion limits (billion laughs protection)
  • Deep nesting protection
  • Memory-bounded parsing with timeout controls

๐ŸŽญ Dual Model Architecture

  • Graph Model: Faithful DDEX structure with references (perfect for compliance)
  • Flattened Model: Developer-friendly denormalized data (easy to consume)
  • Full round-trip data integrity between both representations

๐Ÿงน Parser + Builder Workflow

DDEX Parser extracts data faithfully, while ddex-builder provides smart normalization:

  • Parser role: Preserves exact input structure and semantics
  • Builder role: Transforms data into clean, compliant DDEX 4.3
  • Combined workflow: Parse messy vendor DDEX โ†’ Modify data โ†’ Generate clean output
  • Data integrity: All business data (ISRCs, titles, deals) preserved through round-trip
// Parser preserves input exactly as received
const messyVendorDdex = await parser.parse(vendorFile);
// Builder normalizes output to clean DDEX 4.3
const cleanDdex = await builder.build(messyVendorDdex, { normalize: true });

๐ŸŒ Cross-Platform Compatibility

  • Node.js 16+ with native addon performance and complete data access
  • Browser support via optimized WASM (<500KB)
  • Python 3.8+ with comprehensive type hints
  • TypeScript-first with complete type definitions
  • Complete DDEX data structure access across all language bindings

๐ŸŽต Music Industry Ready

  • Support for all DDEX ERN versions (3.8.2, 4.2, 4.3+)
  • Complete metadata extraction (releases, tracks, artists, rights)
  • Territory and deal information parsing
  • Image and audio resource handling
  • Genre, mood, and classification support

Performance Benchmarks

DDEX Parser v0.4.0 performance measurements:

Streaming Parser Performance (Release Mode)

File Size Parse Time Throughput Elements/sec Memory
10KB ~2ms ~5 MB/s ~50K/sec <1MB
100KB ~8ms ~12 MB/s ~70K/sec <5MB
1MB ~30ms ~35 MB/s ~90K/sec <20MB
3.6MB ~80ms ~45 MB/s ~100K/sec <50MB

Build Mode Comparison

Mode Performance Use Case Memory
Debug ~0.5 MB/s Development/Tests Higher
Release 40+ MB/s Production Optimal

Technology Stack Performance

Component Optimization Benefit
SIMD Pattern memchr library 10x faster searching
Pre-allocation 50MB buffers Zero reallocation
Multiple passes Element-specific SIMD efficiency
Security bounds Configurable Memory protection

Security

v0.4.0 includes comprehensive security enhancements:

  • XXE (XML External Entity) protection
  • Entity expansion limits (billion laughs protection)
  • Deep nesting protection
  • Memory-bounded streaming
  • Supply chain security with cargo-deny and SBOM
  • Zero vulnerabilities, forbids unsafe code

Getting Started

Installation Guides

Node.js/JavaScript Example (v0.4.1+)

const { DdexParser } = require('ddex-parser');
const parser = new DdexParser();

const result = parser.parseSync(xmlContent);

// Full access to parsed data
console.log('Message ID:', result.messageId);
console.log('Releases:', result.releases);
console.log('Resources:', result.resources);
console.log('Deals:', result.deals);

// Access individual release data
result.releases.forEach(release => {
  console.log('Release:', release.title);
  console.log('Artist:', release.displayArtist);
  console.log('Tracks:', release.tracks.length);
});

Round-Trip Compatibility

Seamless integration with ddex-builder for complete workflows with smart normalization:

import { DDEXParser } from 'ddex-parser';
import { DDEXBuilder } from 'ddex-builder';

// Parse existing DDEX file
const parser = new DDEXParser();
const original = await parser.parseFile('input.xml');

// Modify data
const modified = { ...original.flattened };
modified.tracks[0].title = "New Title";

// Build new DDEX file with smart normalization
const builder = new DDEXBuilder();
const newXML = await builder.buildFromFlattened(modified);

// Verify round-trip integrity (with beneficial normalization)
const reparsed = await parser.parseString(newXML);
assert.deepEqual(reparsed.tracks[0].title, "New Title"); // โœ… Data integrity preserved

License

This project is licensed under the MIT License - see the LICENSE file for details.

Related Projects


Built with โค๏ธ for the music industry. Powered by Rust for maximum performance and safety.