finance-datagen 0.3.0

Standard financial data generation
Documentation

finance datagen

Standard financial data generation

Build Status codecov License PyPI

Overview

finance-datagen produces synthetic financial time series for testing, demos, and benchmarking the rest of the finance-* stack without relying on real market data. The numerical core is implemented in Rust and emits Apache Arrow RecordBatch values; the Python layer wraps each generator so the public API returns polars.DataFrame objects.

All public generator classes inherit from DataGenerator, a pydantic base model that validates typed parameters on construction. Use .generate() for the table output, or next(generator) for one-shot iterator-style use. Convenience functions such as generate_prices(...), generate_gbm(...), and generate_signal(...) instantiate the matching model and return .generate().

Generators

Price models (Rust core)

Symbol Model Output columns
GBMGenerator Geometric Brownian Motion (log-Euler) timestamp, symbol, price
HestonGenerator Heston (1993) stochastic volatility (full-truncation Euler) timestamp, symbol, price, variance
GARCHGenerator GARCH(1,1) returns timestamp, symbol, price, return, sigma
ohlc_from_close OHLCV synthesis from any close series timestamp, symbol, open, high, low, close, volume

Price-path convenience wrappers are also exported as generate_prices, generate_gbm, generate_heston, and generate_garch. generate_prices is a plain alias for generate_gbm for examples and tests that want a model-neutral name.

Python generators

Symbol Output
SignalGenerator Long-form [date, symbol, signal, fwd_returns] with target Pearson IC
FactorLoadingsGenerator Wide [symbol, market, value, momentum, size, quality] Barra-style loadings
BenchmarkGenerator [date, benchmark] Gaussian benchmark return series
PositionsGenerator Long-form position panel [date, symbol, price, quantity, market_value, weight]
TransactionsGenerator Transaction log with enum-backed side/position-effect labels and explicit costs
OrdersGenerator Enum-backed order fixtures with side, order type, status, and time-in-force
ExecutionsGenerator Enum-backed execution fixtures for simulated fills
MultiAssetGBMGenerator Correlated multi-asset GBM panel [timestamp, symbol, price, return]
RegimeSwitchingGenerator Markov regime-switching price path [timestamp, symbol, price, return, regime]
MarketImpactCurveGenerator Participation-rate impact curves with temporary, permanent, and total impact in bps
StatisticalRiskModelGenerator PCA-style factor loadings, factor returns, and specific variance
FundamentalRiskModelGenerator Barra-style enum-backed sector/style loadings plus specific variance
FactorCovarianceGenerator Symmetric positive semidefinite factor covariance matrix
SpecificVarianceGenerator Positive idiosyncratic variance vector

Every Python generator has a matching generate_* convenience wrapper, including the legacy generate_signal, generate_factor_loadings, and generate_benchmark functions.

All Rust generators accept an optional seed: int for bit-reproducible output across platforms (ChaCha8 RNG); the Python generators accept a seed for numpy.random.default_rng.

Portfolio, transaction, order, execution, and market-model generators also support enum-backed metadata columns where applicable, including currency, exchange, region, instrument_type, market_type, and venue_type. Portfolio and transaction generators can use finance-dates.Calendar exchange calendars so generated dates and timestamps align with actual business days and session hours.

Quick start

from finance_datagen import OrdersGenerator, generate_prices, generate_signal, ohlc_from_close

closes = generate_prices(symbol="ACME", seed=0)
bars   = ohlc_from_close(closes["price"], symbol="ACME", seed=0)
signal = generate_signal(n_dates=20, n_assets=50, seed=0)
orders = OrdersGenerator(n_dates=3, n_assets=5, orders_per_day=10, exchange="XNYS", currency="USD", seed=0).generate()

See the Data page for model math, parameter ranges, and output schemas, and the API page for a complete function-level reference.

Architecture

The Rust core (rust/src/) is polars-free: every generator builds an arrow_array::RecordBatch and returns it through the Arrow C Data Interface PyCapsule via pyo3-arrow. The Python wrappers call polars.from_arrow(batch) on the receiving end. This keeps the polars-rs and polars-py codebases on opposite sides of a stable ABI boundary, avoiding the binary-incompatibility issues that come with linking polars from both Rust and CPython.