datasynth-config 4.2.1

Configuration schema, validation, and presets for synthetic data generation
Documentation

datasynth-config

Configuration schema, validation, and industry presets for synthetic data generation.

Overview

datasynth-config provides the configuration layer for SyntheticData:

  • Schema Definition: Complete YAML configuration schema
  • Validation: Bounds checking, constraint validation, distribution sum verification
  • Industry Presets: Pre-configured settings for common industries
  • Complexity Levels: Small, medium, and large organization profiles

Configuration Sections

Section Description
global Industry, dates, seed, performance settings
companies Company codes, currencies, volume weights
chart_of_accounts COA complexity and structure
transactions Line items, amounts, sources, temporal patterns
master_data Vendors, customers, materials, assets, employees
document_flows P2P, O2C configuration
intercompany IC transaction types and transfer pricing
balance Opening balances, trial balance generation
subledger AR, AP, FA, inventory settings
fx Currency and exchange rate settings
period_close Close tasks and schedules
fraud Fraud injection rates and types
internal_controls SOX controls and SoD rules
anomaly_injection Anomaly rates and labeling
data_quality Missing values, typos, duplicates
graph_export ML graph export formats
output Output format and compression
templates User-supplied template packs (path, merge_strategy)
llm LLM provider + enrichment flags (gated by llm feature)

Advanced distributions (v3.4–v4.0)

Every knob in distributions now drives runtime behavior:

Sub-section Controls
distributions.amounts Log-normal / Gaussian mixture components; industry_profile fallback
distributions.pareto Heavy-tailed amount override (enabled, alpha, x_min, max_value)
distributions.regime_changes Acquisition / price-increase / … events + economic cycles + parameter drifts
distributions.conditional Breakpoint-based distribution switching; input_field ∈ {month, quarter, constant}
distributions.correlations Gaussian copula for amount↔line_count (others v4.1); full symmetric or upper-triangular matrix
distributions.validation Benford / chi² / KS tests run post-generation; report on EnhancedGenerationResult.statistical_validation

Temporal patterns (v3.4.1+)

Sub-section Controls
temporal_patterns.business_days Weekend / half-day / month-end conventions; T+N settlement rules
temporal_patterns.calendars 15 regions (US, DE, GB, FR, IT, ES, CA, CN, JP, IN, BR, MX, AU, SG, KR)
temporal_patterns.period_end Flat / exponential / extended-crunch / daily-profile decay curves
temporal_patterns.processing_lags Event-to-posting lag distributions
temporal_patterns.fiscal_calendar Calendar / custom / 4-4-5 patterns
temporal_patterns.timezones Multi-region timezone handling
temporal_patterns.intraday Per-segment multipliers (morning spike / lunch dip / eod rush)

Enterprise Process Chain Sections (v0.6.0)

Section Description
source_to_pay SourceToPayConfig -- sourcing projects, RFx events, supplier bids, procurement contracts, catalogs
financial_reporting FinancialReportingConfig -- financial statements, ManagementKpisConfig, BudgetConfig
hr HrConfig -- PayrollConfig, TimeAttendanceConfig, ExpenseConfig
manufacturing_process ManufacturingProcessConfig -- ProductionOrderConfig, ManufacturingCostingConfig, RoutingConfig
sales_quotes SalesQuoteConfig -- quotation generation and quote-to-order conversion

All new sections default to enabled: false for full backward compatibility with existing configurations.

Industry Presets

Industry Description
manufacturing Heavy P2P, inventory, fixed assets
retail High O2C volume, seasonal patterns
financial_services Complex intercompany, high controls
healthcare Regulatory focus, seasonal insurance
technology SaaS revenue patterns, R&D capitalization

Usage

use datasynth_config::{Config, ConfigValidator};

// Load and validate configuration
let config = Config::from_yaml_file("config.yaml")?;
let validator = ConfigValidator::new();
validator.validate(&config)?;

// Use industry preset
let config = Config::preset_manufacturing(Complexity::Medium);

Validation Rules

  • period_months: 1-120 (max 10 years)
  • compression_level: 1-9 when enabled
  • All rate/percentage fields: 0.0-1.0
  • Approval thresholds: strictly ascending order
  • Distribution sums: must equal 1.0 (±0.01 tolerance)

License

Apache-2.0 - See LICENSE for details.