datasynth-runtime 4.0.1

Runtime orchestration, parallel execution, and memory management
Documentation

datasynth-runtime

Runtime orchestration, parallel execution, and memory management.

Overview

datasynth-runtime provides the execution layer for DataSynth:

  • EnhancedOrchestrator: The primary orchestrator — ~30 phases, full enterprise feature integration, multi-threaded, streaming-capable.
  • StreamingOrchestrator: Real-time per-phase streaming variant.
  • Parallel execution via Rayon, memory / CPU / disk guards, progress reporting, pause/resume.

v4.0 note: the legacy GenerationOrchestrator (basic 2-phase CoA + JE) was removed in v4.0.0 after a v3.x deprecation window. All production call paths already routed through EnhancedOrchestrator. Migration:

use datasynth_runtime::{EnhancedOrchestrator, PhaseConfig};
let phases = PhaseConfig::from_config(&config);
let mut orch = EnhancedOrchestrator::new(config, phases)?;
let result = orch.generate()?;

Key components

Component Description
EnhancedOrchestrator Full workflow coordinator; generate() returns EnhancedGenerationResult
StreamingOrchestrator Phase-by-phase streaming variant
PhaseConfig Per-phase enable flags, auto-derived from GeneratorConfig
EnhancedGenerationResult Complete snapshot of generated data + statistics + reports

Generation pipeline (v4.0.1)

Roughly ~30 phases in sequence. Key ones in execution order:

  1. Chart of accounts — industry-specific structure
  2. Master data — vendors, customers, materials, fixed assets, employees; LLM-enrichable via phase_llm_enrichment
  3. Document flows — P2P and O2C chains, business-day snapped (v3.4.1+)
  4. Intercompany — IC transactions, matching, eliminations
  5. OCPM events — OCEL 2.0 event log
  6. Journal entries — from documents + standalone; applies the distribution cascade (fraud → advanced mixture → Pareto → copula → conditional → drift → seasonality)
  7. Anomaly injection — entity-aware, risk-adjusted
  8. Fraud-bias sweep — applies weekend / round-dollar / off-hours / post-close bias to every is_fraud=true entry
  9. Balance validation — debits = credits, Assets = Liabilities + Equity
  10. Accruals + period close — reversal dates snapped to business days (v3.4.3+)
  11. Financial reporting — BS/IS/CF, segments, notes
  12. HR — payroll, time entries (holiday-aware v3.4.2+), expense reports
  13. Manufacturing — production orders with business-day-snapped dates
  14. Treasury / tax / ESG / project accounting
  15. Audit data — engagements, workpapers, evidence, findings, SOX
  16. Banking + AML — KYC, transactions, typology labels
  17. Analytics metadata (v3.3.0+) — prior-year comparatives, benchmarks, drift events
  18. Statistical validation (v3.5.1+) — Benford / chi² / KS; attaches StatisticalValidationReport
  19. Graph + hypergraph export — PyG, Neo4j, DGL, hypergraph

Determinism: seed-in → byte-identical-out on default configs.

TemporalContext (v3.4.1+)

Shared Arc<TemporalContext> bundle built once per pipeline from config.temporal_patterns, threaded into:

  • Document-flow generators (P2P, O2C)
  • HR (time entries, expense reports)
  • Manufacturing (production orders)
  • Period-close (accrual reversals)

Methods: is_business_day, adjust_to_business_day, sample_business_day_in_range. 15 region calendars pre-loaded.

Statistical validation phase (v3.5.1+)

New phase_statistical_validation runs after all JE-adding phases and emits a StatisticalValidationReport on EnhancedGenerationResult.statistical_validation. Supported tests today: Benford first-digit (MAD), chi-squared on log-uniform, KS-on-log-uniform. CorrelationCheck + AndersonDarling scheduled for v4.1.0 (see v4.1 plan).

Usage

use datasynth_runtime::{EnhancedOrchestrator, PhaseConfig};
use datasynth_config::schema::GeneratorConfig;

let config: GeneratorConfig = /* load from YAML or build in code */;
let phases = PhaseConfig::from_config(&config);
let mut orch = EnhancedOrchestrator::new(config, phases)?;
let result = orch.generate()?;

println!("Generated {} journal entries", result.journal_entries.len());
if let Some(report) = &result.statistical_validation {
    println!("Validation: {} tests, all passed = {}",
        report.results.len(), report.all_passed());
}

Pause/Resume

On Unix systems, send SIGUSR1 to toggle pause:

kill -USR1 $(pgrep datasynth-data)

License

Apache-2.0 — see LICENSE for details.