# finance datagen
Standard financial data generation
[](https://github.com/prettygoodcapital/finance-datagen/actions/workflows/build.yaml)
[](https://codecov.io/gh/prettygoodcapital/finance-datagen)
[](https://github.com/prettygoodcapital/finance-datagen)
[](https://pypi.python.org/pypi/finance-datagen)
## Overview
`finance-datagen` produces **synthetic** financial time series for
testing, demos, and benchmarking the rest of the `finance-*` stack
without relying on real market data. The numerical core is implemented
in Rust and emits Apache Arrow `RecordBatch` values; the Python layer
wraps each generator so the public API returns `polars.DataFrame`
objects.
All public generator classes inherit from `DataGenerator`, a pydantic
base model that validates typed parameters on construction. Use
`.generate()` for the table output, or `next(generator)` for one-shot
iterator-style use. Convenience functions such as `generate_prices(...)`,
`generate_gbm(...)`, and `generate_signal(...)` instantiate the matching model
and return `.generate()`.
### Generators
#### Price models (Rust core)
| `GBMGenerator` | Geometric Brownian Motion (log-Euler) | `timestamp, symbol, price` |
| `HestonGenerator` | Heston (1993) stochastic volatility (full-truncation Euler) | `timestamp, symbol, price, variance` |
| `GARCHGenerator` | GARCH(1,1) returns | `timestamp, symbol, price, return, sigma` |
| `ohlc_from_close` | OHLCV synthesis from any close series | `timestamp, symbol, open, high, low, close, volume` |
Price-path convenience wrappers are also exported as `generate_prices`,
`generate_gbm`, `generate_heston`, and `generate_garch`. `generate_prices` is a
plain alias for `generate_gbm` for examples and tests that want a model-neutral
name.
#### Python generators
| `SignalGenerator` | Long-form `[date, symbol, signal, fwd_returns]` with target Pearson IC |
| `FactorLoadingsGenerator` | Wide `[symbol, market, value, momentum, size, quality]` Barra-style loadings |
| `BenchmarkGenerator` | `[date, benchmark]` Gaussian benchmark return series |
| `PositionsGenerator` | Long-form position panel `[date, symbol, price, quantity, market_value, weight]` |
| `TransactionsGenerator` | Transaction log with enum-backed side/position-effect labels and explicit costs |
| `OrdersGenerator` | Enum-backed order fixtures with side, order type, status, and time-in-force |
| `ExecutionsGenerator` | Enum-backed execution fixtures for simulated fills |
| `MultiAssetGBMGenerator` | Correlated multi-asset GBM panel `[timestamp, symbol, price, return]` |
| `RegimeSwitchingGenerator` | Markov regime-switching price path `[timestamp, symbol, price, return, regime]` |
| `MarketImpactCurveGenerator` | Participation-rate impact curves with temporary, permanent, and total impact in bps |
| `StatisticalRiskModelGenerator` | PCA-style factor loadings, factor returns, and specific variance |
| `FundamentalRiskModelGenerator` | Barra-style enum-backed sector/style loadings plus specific variance |
| `FactorCovarianceGenerator` | Symmetric positive semidefinite factor covariance matrix |
| `SpecificVarianceGenerator` | Positive idiosyncratic variance vector |
Every Python generator has a matching `generate_*` convenience wrapper,
including the legacy `generate_signal`, `generate_factor_loadings`, and
`generate_benchmark` functions.
All Rust generators accept an optional `seed: int` for bit-reproducible
output across platforms (ChaCha8 RNG); the Python generators accept a
`seed` for `numpy.random.default_rng`.
Portfolio, transaction, order, execution, and market-model generators
also support enum-backed metadata columns where applicable, including
`currency`, `exchange`, `region`, `instrument_type`, `market_type`, and
`venue_type`. Portfolio and transaction generators can use
`finance-dates.Calendar` exchange calendars so generated dates and
timestamps align with actual business days and session hours.
### Quick start
```python
from finance_datagen import OrdersGenerator, generate_prices, generate_signal, ohlc_from_close
closes = generate_prices(symbol="ACME", seed=0)
bars = ohlc_from_close(closes["price"], symbol="ACME", seed=0)
signal = generate_signal(n_dates=20, n_assets=50, seed=0)
orders = OrdersGenerator(n_dates=3, n_assets=5, orders_per_day=10, exchange="XNYS", currency="USD", seed=0).generate()
```
See the [Data](docs/src/DATA.md) page for model math, parameter ranges,
and output schemas, and the [API](docs/src/API.md) page for a complete
function-level reference.
### Architecture
The Rust core (`rust/src/`) is **polars-free**: every generator builds
an `arrow_array::RecordBatch` and returns it through the
[Arrow C Data Interface](https://arrow.apache.org/docs/format/CDataInterface.html)
PyCapsule via `pyo3-arrow`. The Python wrappers call
`polars.from_arrow(batch)` on the receiving end. This keeps the
polars-rs and polars-py codebases on opposite sides of a stable ABI
boundary, avoiding the binary-incompatibility issues that come with
linking polars from both Rust and CPython.