# numeris
Pure-Rust numerical algorithms library, no-std compatible. Similar in scope to SciPy, suitable for embedded targets (no heap allocation, no FPU assumptions) while being highly performant on desktop/server hardware via SIMD intrinsics.
**Alpha software** — APIs are unstable and may change without notice.
## Features
- **Fixed-size matrices** — stack-allocated, const-generic `Matrix<T, M, N>`
- **Dynamic matrices** — heap-allocated `DynMatrix<T>` with runtime dimensions (optional `alloc` feature)
- **Linear algebra** — LU, Cholesky, QR, SVD decompositions; symmetric eigendecomposition; real Schur decomposition
- **ODE integration** — fixed-step RK4, 7 adaptive Runge-Kutta solvers with dense output, RODAS4 stiff solver
- **Optimization** — root finding (Brent, Newton), BFGS minimization, Gauss-Newton and Levenberg-Marquardt least squares
- **Complex number support** — all decompositions work with `Complex<f32>` / `Complex<f64>` (optional feature)
- **Quaternions** — unit quaternion rotations, SLERP, Euler angles, rotation matrices
- **Norms** — L1, L2, Frobenius, infinity, one norms
- **SIMD acceleration** — NEON (aarch64) and SSE2 (x86_64) intrinsics for matmul, dot products, and element-wise ops; zero-cost scalar fallback for integers and unsupported architectures
- **No-std / embedded** — runs without `std` or heap; float math falls back to software `libm`
## Quick start
```toml
[dependencies]
numeris = "0.1"
```
```rust
use numeris::{Matrix, Vector};
// Solve a linear system Ax = b
let a = Matrix::new([
[2.0_f64, 1.0, -1.0],
[-3.0, -1.0, 2.0],
[-2.0, 1.0, 2.0],
]);
let b = Vector::from_array([8.0, -11.0, -3.0]);
let x = a.solve(&b).unwrap(); // x = [2, 3, -1]
// Cholesky decomposition of a symmetric positive-definite matrix
let spd = Matrix::new([[4.0, 2.0], [2.0, 3.0]]);
let chol = spd.cholesky().unwrap();
let inv = chol.inverse();
// QR decomposition and least-squares
let a = Matrix::new([
[1.0_f64, 0.0],
[1.0, 1.0],
[1.0, 2.0],
]);
let b = Vector::from_array([1.0, 2.0, 4.0]);
let x = a.qr().unwrap().solve(&b); // least-squares fit
// Symmetric eigendecomposition
let sym = Matrix::new([[4.0_f64, 1.0], [1.0, 3.0]]);
let eig = sym.eig_symmetric().unwrap();
let eigenvalues = eig.eigenvalues(); // sorted ascending
let eigenvectors = eig.eigenvectors(); // columns = eigenvectors
// General eigenvalues via Schur decomposition
let a = Matrix::new([[0.0_f64, -1.0], [1.0, 0.0]]); // 90° rotation
let (re, im) = a.eigenvalues().unwrap(); // eigenvalues ±i
// Quaternion rotation
use numeris::Quaternion;
let q = Quaternion::from_axis_angle(
&Vector::from_array([0.0, 0.0, 1.0]),
std::f64::consts::FRAC_PI_2,
);
let v = Vector::from_array([1.0, 0.0, 0.0]);
let rotated = q * v; // [0, 1, 0]
```
## Dynamic matrices
When dimensions aren't known at compile time, use `DynMatrix` (requires `alloc`, included with default `std` feature):
```rust
use numeris::{DynMatrix, DynVector};
// Runtime-sized matrix
let a = DynMatrix::from_rows(3, 3, &[
2.0_f64, 1.0, -1.0,
-3.0, -1.0, 2.0,
-2.0, 1.0, 2.0,
]);
let b = DynVector::from_slice(&[8.0, -11.0, -3.0]);
let x = a.solve(&b).unwrap();
// Convert between fixed and dynamic
use numeris::Matrix;
let fixed = Matrix::new([[1.0, 2.0], [3.0, 4.0]]);
let dynamic: DynMatrix<f64> = fixed.into();
let back: Matrix<f64, 2, 2> = (&dynamic).try_into().unwrap();
// Mixed arithmetic
let result = fixed * &dynamic; // Matrix * DynMatrix → DynMatrix
```
Type aliases: `DynMatrixf64`, `DynMatrixf32`, `DynVectorf64`, `DynVectorf32`, `DynMatrixz64` (complex, requires `complex` feature), etc.
## ODE integration
Fixed-step RK4 and 7 adaptive Runge-Kutta solvers with embedded error estimation and dense output:
```rust
use numeris::ode::{RKAdaptive, RKTS54, AdaptiveSettings};
use numeris::Vector;
// Harmonic oscillator: y'' = -y
let y0 = Vector::from_array([1.0_f64, 0.0]);
let tau = 2.0 * std::f64::consts::PI;
let sol = RKTS54::integrate(
0.0, tau, &y0,
|_t, y| Vector::from_array([y[1], -y[0]]),
&AdaptiveSettings::default(),
).unwrap();
assert!((sol.y[0] - 1.0).abs() < 1e-6); // cos(2π) ≈ 1
```
### Explicit solvers
| Solver | Stages | Order | FSAL | Interpolant |
|---|---|---|---|---|
| `RKF45` | 6 | 5(4) | no | — |
| `RKTS54` | 7 | 5(4) | yes | 4th degree |
| `RKV65` | 10 | 6(5) | no | 6th degree |
| `RKV87` | 17 | 8(7) | no | 7th degree |
| `RKV98` | 21 | 9(8) | no | 8th degree |
| `RKV98NoInterp` | 16 | 9(8) | no | — |
| `RKV98Efficient` | 26 | 9(8) | no | 9th degree |
### Stiff solvers (Rosenbrock)
For stiff ODEs (chemical kinetics, circuit simulation, orbital mechanics with drag), use `RODAS4` — a linearly-implicit Rosenbrock method that solves linear systems involving the Jacobian instead of nonlinear Newton iterations:
```rust
use numeris::ode::{Rosenbrock, RODAS4, AdaptiveSettings};
use numeris::{Vector, Matrix};
// Stiff decay: y' = -1000*y, y(0) = 1
let y0 = Vector::from_array([1.0_f64]);
let sol = RODAS4::integrate(
0.0, 0.01, &y0,
|_t, y| Vector::from_array([-1000.0 * y[0]]),
|_t, _y| Matrix::new([[-1000.0]]), // Jacobian ∂f/∂y
&AdaptiveSettings::default(),
).unwrap();
// Or use integrate_auto for automatic finite-difference Jacobian
```
| Solver | Stages | Order | L-stable |
|---|---|---|---|
| `RODAS4` | 6 | 4(3) | yes |
## Optimization
Root finding, unconstrained minimization, and nonlinear least squares (requires `optim` feature):
```toml
[dependencies]
numeris = { version = "0.1", features = ["optim"] }
```
```rust
use numeris::optim::{brent, minimize_bfgs, least_squares_lm, RootSettings, BfgsSettings, LmSettings};
use numeris::{Matrix, Vector};
// Root finding: solve x² - 2 = 0
let root = brent(|x| x * x - 2.0, 0.0, 2.0, &RootSettings::default()).unwrap();
assert!((root.x - std::f64::consts::SQRT_2).abs() < 1e-12);
// BFGS minimization: minimize (x-1)² + (y-2)²
let min = minimize_bfgs(
|x: &Vector<f64, 2>| (x[0] - 1.0).powi(2) + (x[1] - 2.0).powi(2),
|x: &Vector<f64, 2>| Vector::from_array([2.0 * (x[0] - 1.0), 2.0 * (x[1] - 2.0)]),
&Vector::from_array([0.0, 0.0]),
&BfgsSettings::default(),
).unwrap();
assert!((min.x[0] - 1.0).abs() < 1e-6);
// Levenberg-Marquardt: fit y = a * exp(b * x)
let t = [0.0_f64, 1.0, 2.0, 3.0, 4.0];
let y = [2.0, 2.7, 3.65, 4.95, 6.7];
let fit = least_squares_lm(
|x: &Vector<f64, 2>| {
let mut r = Vector::<f64, 5>::zeros();
for i in 0..5 { r[i] = x[0] * (x[1] * t[i]).exp() - y[i]; }
r
},
|x: &Vector<f64, 2>| {
let mut j = Matrix::<f64, 5, 2>::zeros();
for i in 0..5 {
let e = (x[1] * t[i]).exp();
j[(i, 0)] = e;
j[(i, 1)] = x[0] * t[i] * e;
}
j
},
&Vector::from_array([1.0, 0.1]),
&LmSettings::default(),
).unwrap();
assert!(fit.cost < 0.1);
```
| Algorithm | Function | Use case |
|---|---|---|
| Brent's method | `brent` | Bracketed scalar root finding |
| Newton's method | `newton_1d` | Scalar root finding with derivative |
| BFGS | `minimize_bfgs` | Unconstrained minimization |
| Gauss-Newton | `least_squares_gn` | Nonlinear least squares (QR-based) |
| Levenberg-Marquardt | `least_squares_lm` | Nonlinear least squares (damped) |
Finite-difference utilities: `finite_difference_gradient` and `finite_difference_jacobian` for when analytical derivatives aren't available.
## Digital IIR filters
Biquad cascade IIR filters with Butterworth and Chebyshev Type I design (requires `control` feature):
```toml
[dependencies]
numeris = { version = "0.1", features = ["control"] }
```
```rust
use numeris::control::{butterworth_lowpass, chebyshev1_lowpass, BiquadCascade};
// 4th-order Butterworth lowpass at 1 kHz, 8 kHz sample rate (N=2 biquad sections)
let mut lpf: BiquadCascade<f64, 2> = butterworth_lowpass(4, 1000.0, 8000.0).unwrap();
let y = lpf.tick(1.0); // filter one sample
// Bulk processing
let input = [1.0, 0.5, -0.3, 0.8];
let mut output = [0.0; 4];
lpf.reset();
lpf.process(&input, &mut output);
// Chebyshev Type I: steeper rolloff with 1 dB passband ripple
let mut cheb: BiquadCascade<f64, 2> = chebyshev1_lowpass(4, 1.0, 1000.0, 8000.0).unwrap();
```
| Design | Lowpass | Highpass |
|---|---|---|
| Butterworth | `butterworth_lowpass` | `butterworth_highpass` |
| Chebyshev Type I | `chebyshev1_lowpass` | `chebyshev1_highpass` |
All filters are no-std compatible, use no complex number arithmetic, and work with both `f32` and `f64`.
### PID controller
Discrete-time PID controller with trapezoidal integration, derivative-on-measurement (no derivative kick), optional derivative low-pass filter, and anti-windup via back-calculation:
```rust
use numeris::control::Pid;
// PID controller at 1 kHz with output clamping
let mut pid = Pid::new(2.0_f64, 0.5, 0.1, 0.001)
.with_output_limits(-10.0, 10.0)
.with_derivative_filter(0.01); // first-order LPF on derivative
// Control loop
let setpoint = 1.0;
let mut measurement = 0.0;
for _ in 0..1000 {
let u = pid.tick(setpoint, measurement);
measurement += 0.001 * (-measurement + u); // simple plant
}
assert!((measurement - setpoint).abs() < 0.01);
```
## Complex matrices
Enable the `complex` feature to use decompositions with complex elements:
```toml
[dependencies]
numeris = { version = "0.1", features = ["complex"] }
```
```rust
use numeris::{Complex, Matrix, Vector};
type C = Complex<f64>;
let a = Matrix::new([
[C::new(2.0, 1.0), C::new(1.0, -1.0)],
[C::new(1.0, 0.0), C::new(3.0, 2.0)],
]);
let b = Vector::from_array([C::new(5.0, 3.0), C::new(7.0, 4.0)]);
let x = a.solve(&b).unwrap();
// Hermitian positive-definite Cholesky (A = L * L^H)
let hpd = Matrix::new([
[C::new(4.0, 0.0), C::new(2.0, 1.0)],
[C::new(2.0, -1.0), C::new(5.0, 0.0)],
]);
let chol = hpd.cholesky().unwrap();
```
Complex support adds zero overhead to real-valued code paths. The `LinalgScalar` trait methods (`modulus`, `conj`, etc.) are `#[inline]` identity functions for `f32`/`f64`, fully erased by the compiler.
## Cargo features
| Feature | Default | Description |
|---|---|---|
| `std` | yes | Implies `alloc`. Uses hardware FPU via system libm. |
| `alloc` | via `std` | Enables `DynMatrix` / `DynVector` (heap-allocated, runtime-sized). |
| `ode` | yes | ODE integration (RK4, adaptive solvers). |
| `optim` | no | Optimization (root finding, BFGS, Gauss-Newton, LM). |
| `control` | no | Digital IIR filters (Butterworth, Chebyshev Type I). |
| `libm` | baseline | Pure-Rust software float math. Always available as fallback. |
| `complex` | no | Adds `Complex<f32>` / `Complex<f64>` support via `num-complex`. |
| `all` | no | All features: `std` + `ode` + `optim` + `control` + `complex`. |
```bash
# Default (std + ode)
cargo build
# All features
cargo build --features all
# No-std for embedded
cargo build --no-default-features --features libm
# No-std with dynamic matrices
cargo build --no-default-features --features "libm,alloc"
# With optimization and complex support
cargo build --features "optim,complex"
```
## Module overview
### `matrix` — Fixed-size matrix
`Matrix<T, M, N>` with `[[T; N]; M]` row-major storage.
- Arithmetic: `+`, `-`, `*` (matrix and scalar), negation, element-wise multiply/divide
- Indexing: `m[(i, j)]`, row/column access, block extraction and insertion
- Square: `trace`, `det`, `diag`, `from_diag`, `pow`, `is_symmetric`, `eye`
- Norms: `frobenius_norm`, `norm_inf`, `norm_one`
- Utilities: `transpose`, `from_fn`, `map`, `swap_rows`, `swap_cols`, `sum`, `abs`
- Iteration: `iter()`, `iter_mut()`, `as_slice()`, `IntoIterator`
Vectors are type aliases: `Vector<T, N>` = `Matrix<T, 1, N>` (row vector), `ColumnVector<T, N>` = `Matrix<T, N, 1>`.
Vector-specific: `dot`, `cross`, `outer`, `norm`, `norm_l1`, `normalize`.
#### Size aliases
Convenience aliases are provided for common sizes (all re-exported from the crate root):
| Square | Rectangular (examples) | Vectors |
|---|---|---|
| `Matrix1` .. `Matrix6` | `Matrix2x3`, `Matrix3x4`, `Matrix4x6`, ... | `Vector1` .. `Vector6` |
| | All M×N combinations for M,N ∈ 1..6, M≠N | `ColumnVector1` .. `ColumnVector6` |
```rust
use numeris::{Matrix3, Matrix4x3, Vector3};
let rotation: Matrix3<f64> = Matrix3::eye();
let points: Matrix4x3<f64> = Matrix4x3::zeros(); // 4 rows, 3 cols
let v: Vector3<f64> = Vector3::from_array([1.0, 2.0, 3.0]);
```
### `dynmatrix` — Dynamic matrix (requires `alloc`)
`DynMatrix<T>` with `Vec<T>` row-major storage and runtime dimensions.
- Same arithmetic, norms, block ops, and utilities as fixed `Matrix`
- Mixed ops: `Matrix * DynMatrix`, `DynMatrix + Matrix`, etc. → `DynMatrix`
- `DynVector<T>` newtype with single-index access and `dot`
- Conversions: `From<Matrix>` → `DynMatrix`, `TryFrom<&DynMatrix>` → `Matrix`
- Full linalg: `DynLu`, `DynCholesky`, `DynQr`, `DynSvd`, `DynSymmetricEigen`, `DynSchur` wrappers + convenience methods
Type aliases: `DynMatrixf64`, `DynMatrixf32`, `DynVectorf64`, `DynVectorf32`, `DynMatrixi32`, `DynMatrixi64`, `DynMatrixu32`, `DynMatrixu64`, `DynMatrixz64`, `DynMatrixz32` (complex).
### `linalg` — Decompositions
All decompositions provide both free functions (operating on `&mut impl MatrixMut<T>`) and wrapper structs with `solve()`, `inverse()`, `det()`.
| Decomposition | Free function | Fixed struct | Dynamic struct | Notes |
|---|---|---|---|---|
| LU | `lu_in_place` | `LuDecomposition` | `DynLu` | Partial pivoting |
| Cholesky | `cholesky_in_place` | `CholeskyDecomposition` | `DynCholesky` | A = LL^H (Hermitian) |
| QR | `qr_in_place` | `QrDecomposition` | `DynQr` | Householder reflections, least-squares |
| SVD | `bidiagonalize` + `bidiagonal_qr` | `SvdDecomposition` | `DynSvd` | Rectangular M×N, singular values, rank |
| Symmetric Eigen | `tridiagonalize` + `tridiagonal_qr_*` | `SymmetricEigen` | `DynSymmetricEigen` | Real eigenvalues, orthogonal eigenvectors |
| Schur | `hessenberg` + `francis_qr` | `SchurDecomposition` | `DynSchur` | Quasi-upper-triangular, general eigenvalues |
Convenience methods on `Matrix`: `a.lu()`, `a.cholesky()`, `a.qr()`, `a.svd()`, `a.solve(&b)`, `a.inverse()`, `a.det()`, `a.eig_symmetric()`, `a.eigenvalues_symmetric()`, `a.schur()`, `a.eigenvalues()`.
Same convenience methods on `DynMatrix`: `a.lu()`, `a.cholesky()`, `a.qr()`, `a.svd()`, `a.solve(&b)`, `a.inverse()`, `a.det()`, `a.eig_symmetric()`, `a.eigenvalues_symmetric()`, `a.schur()`, `a.eigenvalues()`.
### `ode` — ODE integration
Fixed-step `rk4` / `rk4_step` and 7 adaptive Runge-Kutta solvers via the `RKAdaptive` trait. PI step-size controller (Söderlind & Wang 2006). Dense output / interpolation available for most solvers (gated behind `std`). For stiff systems, `RODAS4` provides an L-stable Rosenbrock method via the `Rosenbrock` trait — accepts user-supplied or automatic finite-difference Jacobians.
### `optim` — Optimization (requires `optim` feature)
- **Root finding**: `brent` (bracketed, superlinear convergence), `newton_1d` (with derivative)
- **Minimization**: `minimize_bfgs` (BFGS quasi-Newton with Armijo line search)
- **Least squares**: `least_squares_gn` (Gauss-Newton via QR), `least_squares_lm` (Levenberg-Marquardt via damped normal equations)
- **Finite differences**: `finite_difference_gradient`, `finite_difference_jacobian` for numerical derivatives
- All algorithms use `FloatScalar` bound (real-valued), configurable via settings structs with `Default` impls for `f32` and `f64`
### `control` — Digital filters and controllers (requires `control` feature)
Biquad cascade filters designed via the bilinear transform, and a discrete-time PID controller. No `complex` feature dependency.
- `Biquad<T>` — single second-order section, Direct Form II Transposed
- `BiquadCascade<T, N>` — cascade of `N` biquad sections (filter order ≤ 2N)
- Design functions: `butterworth_lowpass`, `butterworth_highpass`, `chebyshev1_lowpass`, `chebyshev1_highpass`
- Supports arbitrary even/odd filter orders; odd-order uses degenerate first-order last section
- `tick`, `process`, `process_inplace` for sample-by-sample or bulk filtering
- `Pid<T>` — PID controller with derivative filtering, output clamping, anti-windup back-calculation
### `quaternion` — Unit quaternion rotations
`Quaternion<T>` with scalar-first convention `[w, x, y, z]`.
- Construction: `new`, `identity`, `from_axis_angle`, `from_euler`, `from_rotation_matrix`, `rotx`, `roty`, `rotz`
- Operations: `*` (Hamilton product), `* Vector3` (rotation), `conjugate`, `inverse`, `normalize`, `slerp`
- Conversion: `to_rotation_matrix`, `to_axis_angle`, `to_euler`
### `traits` — Element traits
| Trait | Bounds | Used by |
|---|---|---|
| `Scalar` | `Copy + PartialEq + Debug + Zero + One + Num` | All matrix ops |
| `FloatScalar` | `Scalar + Float + LinalgScalar<Real=Self>` | Quaternions, ordered comparisons |
| `LinalgScalar` | `Scalar` + `modulus`, `conj`, `re`, `lsqrt`, `lln`, `from_real` | Decompositions, norms |
| `MatrixRef<T>` | Read-only `get(row, col)` | Generic algorithms |
| `MatrixMut<T>` | Adds `get_mut(row, col)` | In-place decompositions |
## Module plan
Checked items are implemented; unchecked are potential future work.
- [x] **matrix** — Fixed-size matrix (stack-allocated, const-generic dimensions), size aliases up to 6×6
- [x] **linalg** — LU, Cholesky, QR, SVD decompositions; symmetric eigendecomposition; real Schur decomposition; solvers, inverse, determinant; complex support
- [x] **quaternion** — Unit quaternion for rotations (SLERP, Euler, axis-angle, rotation matrices)
- [x] **ode** — ODE integration (RK4, 7 adaptive solvers, dense output, RODAS4 stiff solver)
- [x] **dynmatrix** — Heap-allocated runtime-sized matrix/vector (`alloc` feature)
- [ ] **interp** — Interpolation (linear, cubic spline, Hermite)
- [x] **optim** — Optimization (Brent, Newton, BFGS, Gauss-Newton, Levenberg-Marquardt)
- [ ] **quad** — Numerical quadrature / integration
- [ ] **fft** — Fast Fourier Transform
- [ ] **special** — Special functions (Bessel, gamma, erf, etc.)
- [ ] **stats** — Statistics and distributions
- [ ] **poly** — Polynomial operations and root-finding
- [x] **control** — Digital IIR filters (Butterworth, Chebyshev), PID controllers, state-space systems, discrete-time control (ZOH, Tustin bilinear transform)
## Performance
numeris is designed for two use cases: no-std embedded systems and high-performance desktop/server computing.
**SIMD acceleration** is always-on for f32/f64 — no feature flag needed. On aarch64 (NEON) and x86_64 (SSE2/AVX/AVX-512), matrix multiply, dot products, and element-wise operations use hardware SIMD intrinsics via `core::arch`. Matrix multiply uses register-blocked micro-kernels (inspired by [nano-gemm](https://github.com/sarah-quinones/nano-gemm) by Sarah Quinones) that accumulate MR×NR tiles in SIMD registers across the full k-loop, writing C only once — reducing memory traffic by up to O(n) vs. naive implementations. Integer and complex types fall back to scalar loops at zero cost (dead-code eliminated at monomorphization via `TypeId` dispatch).
SSE2 and NEON are always-on baselines. AVX and AVX-512 are compile-time opt-in via `-C target-cpu=native`; dispatch selects the widest available ISA.
| Architecture | ISA | f64 tile (MR×NR) | f32 tile (MR×NR) |
|---|---|---|---|
| aarch64 | NEON (128-bit) | 4×4 | 8×4 |
| x86_64 | SSE2 (128-bit) | 4×4 | 8×4 |
| x86_64 | AVX (256-bit) | 8×4 | 16×4 |
| x86_64 | AVX-512 (512-bit) | 16×4 | 32×4 |
| other | scalar fallback | 4×4 | 4×4 |
## Design decisions
- **Stack-allocated**: `[[T; N]; M]` storage, no heap. Dimensions are const generics.
- **Heap-allocated**: `DynMatrix` uses `Vec<T>` for runtime dimensions, behind `alloc` feature.
- **`num-traits`**: Generic numeric bounds with `default-features = false`.
- **In-place algorithms**: Decompositions operate on `&mut impl MatrixMut<T>`, avoiding allocator/storage trait complexity. Both `Matrix` and `DynMatrix` implement `MatrixMut`, so the same free functions work for both.
- **Integer matrices**: Work with `Scalar` (all basic ops). Float-only operations (`det`, norms, decompositions) require `LinalgScalar` or `FloatScalar`.
- **Complex support**: Additive, behind a feature flag. Zero cost for real-only usage.
## Acknowledgments
The register-blocked SIMD matrix multiply micro-kernels are inspired by [nano-gemm](https://github.com/sarah-quinones/nano-gemm) and [faer](https://github.com/sarah-quinones/faer-rs) by Sarah Quinones.
## License
MIT