datasynth-cli 5.34.0

Command-line interface for synthetic enterprise data generation
datasynth-cli-5.34.0 is not a library.

datasynth-cli

Command-line interface for synthetic accounting data generation.

Overview

datasynth-cli provides the datasynth-data binary for command-line usage:

  • generate: Generate synthetic data from configuration
  • init: Create configuration files with industry presets (supports --from-description natural-language mode via llm feature)
  • validate: Validate configuration files
  • info: Display available presets and options
  • scenario: List / validate / export / generate counterfactual scenarios
  • fingerprint: Extract / validate / synthesize via the privacy-preserving fingerprint format
  • templates: Export starter template packs, validate, or enrich (v3.5.0+) name pools via mock or live LLM (OpenAI-compatible HTTP; OpenRouter-friendly defaults)
  • audit: Audit FSM operations (blueprint diff, validate)

Installation

cargo build --release
# Binary at: target/release/datasynth-data

Commands

Generate Data

# From configuration file
datasynth-data generate --config config.yaml --output ./output

# Demo mode with defaults
datasynth-data generate --demo --output ./demo-output

# With verbose logging
datasynth-data generate --config config.yaml --output ./output -v

Create Configuration

# Industry preset with complexity level
datasynth-data init --industry manufacturing --complexity medium -o config.yaml

# Available industries:
#   manufacturing, retail, financial_services, healthcare,
#   technology, energy, telecom, transportation, hospitality

Validate Configuration

datasynth-data validate --config config.yaml

Show Options

datasynth-data info

Template Enrichment (v3.5.0+, llm feature)

Offline-deterministic enrichment of vendor / customer / material name pools. Runs outside the generate pipeline — the enriched YAML is then consumed at generate time via --templates <path>.

# Mock backend: deterministic, seed-driven, no network.
datasynth-data templates enrich \
  --input ./in.yaml --output ./enriched.yaml \
  --category vendor_name --industry retail --region DE \
  --sub-category office_supplies --count 50 \
  --backend mock --seed 42

# Live HTTP backend (default targets OpenRouter so any vendor works with one key).
OPENROUTER_API_KEY=sk-or-... \
datasynth-data templates enrich \
  --input ./in.yaml --output ./enriched.yaml \
  --category customer_name --industry retail --region DE \
  --sub-category enterprise --count 50 \
  --backend http \
  --model anthropic/claude-sonnet-4.5 \
  --api-key-env OPENROUTER_API_KEY \
  --base-url https://openrouter.ai/api

# Then generate with the enriched templates:
datasynth-data generate --config config.yaml --templates ./enriched.yaml --output ./output

--category accepts vendor_name, customer_name, or material_desc. --backend http requires building with --features llm.

Scenarios

# Counterfactual scenario generation (baseline + intervention pairs)
datasynth-data scenario list --config config.yaml
datasynth-data scenario generate --config config.yaml --output ./output
datasynth-data scenario export --config config.yaml --scenario supply_chain_disruption -o scenario.dss

Signal Handling (Unix)

Toggle pause during generation:

kill -USR1 $(pgrep datasynth-data)

Exit Codes

Code Description
0 Success
1 Configuration error
2 Generation error
3 I/O error

License

Apache-2.0 - See LICENSE for details.