stochastic-rs
A high-performance Rust library for stochastic process simulation,
quantitative finance, statistics, copulas, distributions, and
neural-network volatility surrogates. Generic over f32 / f64, with
SIMD acceleration on CPU and CUDA / Metal / Accelerate / cubecl backends
where they pay off, and first-class Python bindings via PyO3.
Documentation
📖 stochastic.rust-dd.com — full docs site (Fumadocs + Next.js, deployed on Vercel).
Local preview from source under website/:
Highlights:
- 120+ stochastic processes — diffusion, jump, fractional / rough,
short-rate, HJM, LMM, fBM, Hawkes, Lévy. Generic-precision
ProcessExt<T>impl, SIMD on CPU, optional CUDA / Metal for FGN / fBM. - Pricing & calibration — closed-form (BSM, Bachelier, Black76, Bjerksund-Stensland, …), Fourier (Heston / Bates / Merton-jump / Kou / VG / CGMY / HKDE / double-Heston), Monte Carlo (basket, rainbow, cliquet, autocallable, spread), finite difference, Bermudan LSM, Heston SLV. Heston / SABR / SVJ / Lévy / rough Bergomi / double-Heston / Hull-White swaption-grid calibrators.
- Statistics & risk — Hurst (Fukasawa), MLE for 1-D diffusions with 6 transition densities, ADF / KPSS / Phillips-Perron, realised variance with BNHLS bandwidth, HMM, changepoint, particle filter, UKF. VaR / CVaR / drawdown, Sharpe / Sortino / IR / Calmar.
- Fixed income & credit — yield-curve bootstrapping, Nelson-Siegel / Svensson, multi-curve, IRS / inflation swaps, Vasicek / CIR / Hull-White / G2++ short-rate engines, Merton structural model, reduced-form survival curves, CDS pricing, JLT migration matrices.
- Microstructure — Almgren-Chriss, Kyle (1985), Bouchaud propagator, full price-time priority order book.
- Distributions & copulas — 19 SIMD distributions with closed-form pdf / cdf / cf / moments. Clayton / Frank / Gumbel / Independence bivariate; Gaussian / vine multivariate.
- Python bindings — 210 entries (198 PyO3 classes + 12 functions) spanning every sub-crate except AI surrogates. Numpy-in / numpy-out.
Installation
Rust
[]
= "2.0.0"
use *;
use Gbm;
use HestonPricer;
For per-sub-crate (lean) builds, OpenBLAS / CUDA / Metal / cubecl / Accelerate feature flags, native CPU optimisation, and SIMD details, see the installation guide on the docs site.
Python
Source build (requires the Rust toolchain):
Linux (x86_64 / aarch64) and macOS (arm64 / x86_64) wheels ship with
the openblas feature on. The Windows wheel omits the 15
BLAS-backed classes; everything else (≈195 classes / 12 functions)
works identically. See the
Python bindings page for the parity
table and the source-build path with vcpkg.
Quickstart
use *;
use Ou;
use HestonPricer;
use OptionType;
# Mean-reverting OU path
=
= # numpy.ndarray, shape (1000,)
# Heston European call
=
=
More end-to-end recipes (Heston calibration, fBM Hurst estimation, vol-surface from quotes, Python interop) live in the tutorials section.
Benchmarks
FGN — CPU vs CUDA native (f32, H = 0.7)
Single path:
| n | CPU sample |
CUDA sample_cuda_native(1) |
Speedup |
|---|---|---|---|
| 1,024 | 8.1 µs | 46 µs | 0.18× |
| 4,096 | 35 µs | 84 µs | 0.42× |
| 16,384 | 147 µs | 110 µs | 1.3× |
| 65,536 | 850 µs | 227 µs | 3.7× |
Batch:
| n, m | CPU sample_par |
CUDA sample_cuda_native |
Speedup |
|---|---|---|---|
| 4,096, 32 | 147 µs | 117 µs | 1.3× |
| 4,096, 512 | 1.78 ms | 2.37 ms | 0.75× |
| 65,536, 128 | 12.6 ms | 10.5 ms | 1.2× |
| 65,536, 1 k | 102 ms | 93 ms | 1.1× |
CUDA wins for large n (≥ 16 k); CPU rayon dominates for medium n
because of the GPU launch / transfer overhead.
Distribution sampling — multicore (cargo bench --bench dist_multicore)
sample_matrix, 1-thread vs 14-thread rayon. f64 continuous, integer
discrete. Most distributions: 1024 × 1024; heavy discrete: 512 × 512.
| Distribution | 1T (ms) | MT (ms) | Speedup |
|---|---|---|---|
| Normal | 1.78 | 0.34 | 5.28× |
| Cauchy | 6.23 | 0.90 | 6.96× |
| LogNormal | 5.07 | 0.81 | 6.25× |
| Gamma | 5.20 | 0.72 | 7.19× |
| StudentT | 7.89 | 1.89 | 4.18× |
| Beta | 11.85 | 1.68 | 7.04× |
| Weibull | 13.17 | 1.73 | 7.59× |
| AlphaStable | 42.52 | 5.36 | 7.94× |
| Poisson | 2.28 | 0.42 | 5.40× |
| Hypergeo (512²) | 20.99 | 2.76 | 7.60× |
(Full table — 18 distributions — on the benchmarks page.)
Normal single-thread fill_slice vs the upstream rand_distr baseline:
- vs
rand_distr + SimdRng— ≈ 1.21× to 1.35× - vs
rand_distr + rand::rng()— ≈ 4.09× to 4.61×
Contributing
Contributions are welcome — bug reports, feature suggestions, or PRs.
Open an issue or start a discussion on GitHub. Per-feature recipes
(add-diffusion-process, adding-distribution, calibration-pattern,
docs-writing, …) live under .claude/skills/.
License
MIT — see LICENSE.