datasynth-runtime
Runtime orchestration, parallel execution, and memory management.
Overview
datasynth-runtime provides the execution layer for DataSynth:
EnhancedOrchestrator: The primary orchestrator — ~30 phases, full enterprise feature integration, multi-threaded, streaming-capable.StreamingOrchestrator: Real-time per-phase streaming variant.- Parallel execution via Rayon, memory / CPU / disk guards, progress reporting, pause/resume.
v4.0 note: the legacy
GenerationOrchestrator(basic 2-phase CoA + JE) was removed in v4.0.0 after a v3.x deprecation window. All production call paths already routed throughEnhancedOrchestrator. Migration:use ; let phases = from_config; let mut orch = new?; let result = orch.generate?;
Key components
| Component | Description |
|---|---|
EnhancedOrchestrator |
Full workflow coordinator; generate() returns EnhancedGenerationResult |
StreamingOrchestrator |
Phase-by-phase streaming variant |
PhaseConfig |
Per-phase enable flags, auto-derived from GeneratorConfig |
EnhancedGenerationResult |
Complete snapshot of generated data + statistics + reports |
Generation pipeline (v4.0.1)
Roughly ~30 phases in sequence. Key ones in execution order:
- Chart of accounts — industry-specific structure
- Master data — vendors, customers, materials, fixed assets, employees; LLM-enrichable via
phase_llm_enrichment - Document flows — P2P and O2C chains, business-day snapped (v3.4.1+)
- Intercompany — IC transactions, matching, eliminations
- OCPM events — OCEL 2.0 event log
- Journal entries — from documents + standalone; applies the distribution cascade (fraud → advanced mixture → Pareto → copula → conditional → drift → seasonality)
- Anomaly injection — entity-aware, risk-adjusted
- Fraud-bias sweep — applies weekend / round-dollar / off-hours / post-close bias to every
is_fraud=trueentry - Balance validation — debits = credits, Assets = Liabilities + Equity
- Accruals + period close — reversal dates snapped to business days (v3.4.3+)
- Financial reporting — BS/IS/CF, segments, notes
- HR — payroll, time entries (holiday-aware v3.4.2+), expense reports
- Manufacturing — production orders with business-day-snapped dates
- Treasury / tax / ESG / project accounting
- Audit data — engagements, workpapers, evidence, findings, SOX
- Banking + AML — KYC, transactions, typology labels
- Analytics metadata (v3.3.0+) — prior-year comparatives, benchmarks, drift events
- Statistical validation (v3.5.1+) — Benford / chi² / KS; attaches
StatisticalValidationReport - Graph + hypergraph export — PyG, Neo4j, DGL, hypergraph
Determinism: seed-in → byte-identical-out on default configs.
TemporalContext (v3.4.1+)
Shared Arc<TemporalContext> bundle built once per pipeline from
config.temporal_patterns, threaded into:
- Document-flow generators (P2P, O2C)
- HR (time entries, expense reports)
- Manufacturing (production orders)
- Period-close (accrual reversals)
Methods: is_business_day, adjust_to_business_day,
sample_business_day_in_range. 15 region calendars pre-loaded.
Statistical validation phase (v3.5.1+)
New phase_statistical_validation runs after all JE-adding phases
and emits a StatisticalValidationReport on
EnhancedGenerationResult.statistical_validation. Supported tests
today: Benford first-digit (MAD), chi-squared on log-uniform,
KS-on-log-uniform. CorrelationCheck + AndersonDarling scheduled for
v4.1.0 (see v4.1 plan).
Usage
use ;
use GeneratorConfig;
let config: GeneratorConfig = /* load from YAML or build in code */;
let phases = from_config;
let mut orch = new?;
let result = orch.generate?;
println!;
if let Some = &result.statistical_validation
Pause/Resume
On Unix systems, send SIGUSR1 to toggle pause:
License
Apache-2.0 — see LICENSE for details.