finance datagen
Standard financial data generation
Overview
finance-datagen produces synthetic financial time series for
testing, demos, and benchmarking the rest of the finance-* stack
without relying on real market data. The numerical core is implemented
in Rust and emits Apache Arrow RecordBatch values; the Python layer
wraps each generator so the public API returns polars.DataFrame
objects.
All public generator classes inherit from DataGenerator, a pydantic
base model that validates typed parameters on construction. Use
.generate() for the table output, or next(generator) for one-shot
iterator-style use. Convenience functions such as generate_gbm(...)
and generate_signal(...) instantiate the matching model and return
.generate().
Generators
Price models (Rust core)
| Symbol | Model | Output columns |
|---|---|---|
GBMGenerator |
Geometric Brownian Motion (log-Euler) | timestamp, symbol, price |
HestonGenerator |
Heston (1993) stochastic volatility (full-truncation Euler) | timestamp, symbol, price, variance |
GARCHGenerator |
GARCH(1,1) returns | timestamp, symbol, price, return, sigma |
ohlc_from_close |
OHLCV synthesis from any close series | timestamp, symbol, open, high, low, close, volume |
Price-path convenience wrappers are also exported as generate_gbm,
generate_heston, and generate_garch.
Python generators
| Symbol | Output |
|---|---|
SignalGenerator |
Long-form [date, symbol, signal, fwd_returns] with target Pearson IC |
FactorLoadingsGenerator |
Wide [symbol, market, value, momentum, size, quality] Barra-style loadings |
BenchmarkGenerator |
[date, benchmark] Gaussian benchmark return series |
PositionsGenerator |
Long-form position panel [date, symbol, price, quantity, market_value, weight] |
TransactionsGenerator |
Transaction log with enum-backed side/position-effect labels and explicit costs |
OrdersGenerator |
Enum-backed order fixtures with side, order type, status, and time-in-force |
ExecutionsGenerator |
Enum-backed execution fixtures for simulated fills |
MultiAssetGBMGenerator |
Correlated multi-asset GBM panel [timestamp, symbol, price, return] |
RegimeSwitchingGenerator |
Markov regime-switching price path [timestamp, symbol, price, return, regime] |
MarketImpactCurveGenerator |
Participation-rate impact curves with temporary, permanent, and total impact in bps |
StatisticalRiskModelGenerator |
PCA-style factor loadings, factor returns, and specific variance |
FundamentalRiskModelGenerator |
Barra-style enum-backed sector/style loadings plus specific variance |
FactorCovarianceGenerator |
Symmetric positive semidefinite factor covariance matrix |
SpecificVarianceGenerator |
Positive idiosyncratic variance vector |
Every Python generator has a matching generate_* convenience wrapper,
including the legacy generate_signal, generate_factor_loadings, and
generate_benchmark functions.
All Rust generators accept an optional seed: int for bit-reproducible
output across platforms (ChaCha8 RNG); the Python generators accept a
seed for numpy.random.default_rng.
Portfolio, transaction, order, execution, and market-model generators
also support enum-backed metadata columns where applicable, including
currency, exchange, region, instrument_type, market_type, and
venue_type. Portfolio and transaction generators can use
finance-dates.Calendar exchange calendars so generated dates and
timestamps align with actual business days and session hours.
Quick start
=
=
=
=
See the Data page for model math, parameter ranges, and output schemas, and the API page for a complete function-level reference.
Architecture
The Rust core (rust/src/) is polars-free: every generator builds
an arrow_array::RecordBatch and returns it through the
Arrow C Data Interface
PyCapsule via pyo3-arrow. The Python wrappers call
polars.from_arrow(batch) on the receiving end. This keeps the
polars-rs and polars-py codebases on opposite sides of a stable ABI
boundary, avoiding the binary-incompatibility issues that come with
linking polars from both Rust and CPython.
[!NOTE] This library was generated using copier from the Base Python Project Template repository.