numeris

Pure-Rust numerical algorithms library, no-std compatible. Similar in scope to SciPy, suitable for embedded targets (no heap allocation, no FPU assumptions) while being highly performant on desktop/server hardware via SIMD intrinsics.

Alpha software — APIs are unstable and may change without notice.

Features

Fixed-size matrices — stack-allocated, const-generic Matrix<T, M, N>
Dynamic matrices — heap-allocated DynMatrix<T> with runtime dimensions (optional alloc feature)
Linear algebra — LU, Cholesky, QR, SVD decompositions; symmetric eigendecomposition; real Schur decomposition
ODE integration — fixed-step RK4, 7 adaptive Runge-Kutta solvers with dense output, RODAS4 stiff solver
Optimization — root finding (Brent, Newton), BFGS minimization, Gauss-Newton and Levenberg-Marquardt least squares
Complex number support — all decompositions work with Complex<f32> / Complex<f64> (optional feature)
Quaternions — unit quaternion rotations, SLERP, Euler angles, rotation matrices
Norms — L1, L2, Frobenius, infinity, one norms
SIMD acceleration — NEON (aarch64) and SSE2 (x86_64) intrinsics for matmul, dot products, and element-wise ops; zero-cost scalar fallback for integers and unsupported architectures
No-std / embedded — runs without std or heap; float math falls back to software libm

Quick start

[dependencies]
numeris = "0.1"

use numeris::{Matrix, Vector};

// Solve a linear system Ax = b
let a = Matrix::new([
    [2.0_f64, 1.0, -1.0],
    [-3.0, -1.0, 2.0],
    [-2.0, 1.0, 2.0],
]);
let b = Vector::from_array([8.0, -11.0, -3.0]);
let x = a.solve(&b).unwrap(); // x = [2, 3, -1]

// Cholesky decomposition of a symmetric positive-definite matrix
let spd = Matrix::new([[4.0, 2.0], [2.0, 3.0]]);
let chol = spd.cholesky().unwrap();
let inv = chol.inverse();

// QR decomposition and least-squares
let a = Matrix::new([
    [1.0_f64, 0.0],
    [1.0, 1.0],
    [1.0, 2.0],
]);
let b = Vector::from_array([1.0, 2.0, 4.0]);
let x = a.qr().unwrap().solve(&b); // least-squares fit

// Symmetric eigendecomposition
let sym = Matrix::new([[4.0_f64, 1.0], [1.0, 3.0]]);
let eig = sym.eig_symmetric().unwrap();
let eigenvalues = eig.eigenvalues();   // sorted ascending
let eigenvectors = eig.eigenvectors(); // columns = eigenvectors

// General eigenvalues via Schur decomposition
let a = Matrix::new([[0.0_f64, -1.0], [1.0, 0.0]]); // 90° rotation
let (re, im) = a.eigenvalues().unwrap(); // eigenvalues ±i

// Quaternion rotation
use numeris::Quaternion;
let q = Quaternion::from_axis_angle(
    &Vector::from_array([0.0, 0.0, 1.0]),
    std::f64::consts::FRAC_PI_2,
);
let v = Vector::from_array([1.0, 0.0, 0.0]);
let rotated = q * v; // [0, 1, 0]

Dynamic matrices

When dimensions aren't known at compile time, use DynMatrix (requires alloc, included with default std feature):

use numeris::{DynMatrix, DynVector};

// Runtime-sized matrix
let a = DynMatrix::from_rows(3, 3, &[
    2.0_f64, 1.0, -1.0,
    -3.0, -1.0, 2.0,
    -2.0, 1.0, 2.0,
]);
let b = DynVector::from_slice(&[8.0, -11.0, -3.0]);
let x = a.solve(&b).unwrap();

// Convert between fixed and dynamic
use numeris::Matrix;
let fixed = Matrix::new([[1.0, 2.0], [3.0, 4.0]]);
let dynamic: DynMatrix<f64> = fixed.into();
let back: Matrix<f64, 2, 2> = (&dynamic).try_into().unwrap();

// Mixed arithmetic
let result = fixed * &dynamic; // Matrix * DynMatrix → DynMatrix

Type aliases: DynMatrixf64, DynMatrixf32, DynVectorf64, DynVectorf32, DynMatrixz64 (complex, requires complex feature), etc.

ODE integration

Fixed-step RK4 and 7 adaptive Runge-Kutta solvers with embedded error estimation and dense output:

use numeris::ode::{RKAdaptive, RKTS54, AdaptiveSettings};
use numeris::Vector;

// Harmonic oscillator: y'' = -y
let y0 = Vector::from_array([1.0_f64, 0.0]);
let tau = 2.0 * std::f64::consts::PI;
let sol = RKTS54::integrate(
    0.0, tau, &y0,
    |_t, y| Vector::from_array([y[1], -y[0]]),
    &AdaptiveSettings::default(),
).unwrap();
assert!((sol.y[0] - 1.0).abs() < 1e-6); // cos(2π) ≈ 1

Explicit solvers

Solver	Stages	Order	FSAL	Interpolant
`RKF45`	6	5(4)	no	—
`RKTS54`	7	5(4)	yes	4th degree
`RKV65`	10	6(5)	no	6th degree
`RKV87`	17	8(7)	no	7th degree
`RKV98`	21	9(8)	no	8th degree
`RKV98NoInterp`	16	9(8)	no	—
`RKV98Efficient`	26	9(8)	no	9th degree

Stiff solvers (Rosenbrock)

For stiff ODEs (chemical kinetics, circuit simulation, orbital mechanics with drag), use RODAS4 — a linearly-implicit Rosenbrock method that solves linear systems involving the Jacobian instead of nonlinear Newton iterations:

use numeris::ode::{Rosenbrock, RODAS4, AdaptiveSettings};
use numeris::{Vector, Matrix};

// Stiff decay: y' = -1000*y, y(0) = 1
let y0 = Vector::from_array([1.0_f64]);
let sol = RODAS4::integrate(
    0.0, 0.01, &y0,
    |_t, y| Vector::from_array([-1000.0 * y[0]]),
    |_t, _y| Matrix::new([[-1000.0]]),  // Jacobian ∂f/∂y
    &AdaptiveSettings::default(),
).unwrap();
// Or use integrate_auto for automatic finite-difference Jacobian

Solver	Stages	Order	L-stable
`RODAS4`	6	4(3)	yes

Optimization

Root finding, unconstrained minimization, and nonlinear least squares (requires optim feature):

[dependencies]
numeris = { version = "0.1", features = ["optim"] }

use numeris::optim::{brent, minimize_bfgs, least_squares_lm, RootSettings, BfgsSettings, LmSettings};
use numeris::{Matrix, Vector};

// Root finding: solve x² - 2 = 0
let root = brent(|x| x * x - 2.0, 0.0, 2.0, &RootSettings::default()).unwrap();
assert!((root.x - std::f64::consts::SQRT_2).abs() < 1e-12);

// BFGS minimization: minimize (x-1)² + (y-2)²
let min = minimize_bfgs(
    |x: &Vector<f64, 2>| (x[0] - 1.0).powi(2) + (x[1] - 2.0).powi(2),
    |x: &Vector<f64, 2>| Vector::from_array([2.0 * (x[0] - 1.0), 2.0 * (x[1] - 2.0)]),
    &Vector::from_array([0.0, 0.0]),
    &BfgsSettings::default(),
).unwrap();
assert!((min.x[0] - 1.0).abs() < 1e-6);

// Levenberg-Marquardt: fit y = a * exp(b * x)
let t = [0.0_f64, 1.0, 2.0, 3.0, 4.0];
let y = [2.0, 2.7, 3.65, 4.95, 6.7];
let fit = least_squares_lm(
    |x: &Vector<f64, 2>| {
        let mut r = Vector::<f64, 5>::zeros();
        for i in 0..5 { r[i] = x[0] * (x[1] * t[i]).exp() - y[i]; }
        r
    },
    |x: &Vector<f64, 2>| {
        let mut j = Matrix::<f64, 5, 2>::zeros();
        for i in 0..5 {
            let e = (x[1] * t[i]).exp();
            j[(i, 0)] = e;
            j[(i, 1)] = x[0] * t[i] * e;
        }
        j
    },
    &Vector::from_array([1.0, 0.1]),
    &LmSettings::default(),
).unwrap();
assert!(fit.cost < 0.1);

Algorithm	Function	Use case
Brent's method	`brent`	Bracketed scalar root finding
Newton's method	`newton_1d`	Scalar root finding with derivative
BFGS	`minimize_bfgs`	Unconstrained minimization
Gauss-Newton	`least_squares_gn`	Nonlinear least squares (QR-based)
Levenberg-Marquardt	`least_squares_lm`	Nonlinear least squares (damped)

Finite-difference utilities: finite_difference_gradient and finite_difference_jacobian for when analytical derivatives aren't available.

Digital IIR filters

Biquad cascade IIR filters with Butterworth and Chebyshev Type I design (requires control feature):

[dependencies]
numeris = { version = "0.1", features = ["control"] }

use numeris::control::{butterworth_lowpass, chebyshev1_lowpass, BiquadCascade};

// 4th-order Butterworth lowpass at 1 kHz, 8 kHz sample rate (N=2 biquad sections)
let mut lpf: BiquadCascade<f64, 2> = butterworth_lowpass(4, 1000.0, 8000.0).unwrap();
let y = lpf.tick(1.0); // filter one sample

// Bulk processing
let input = [1.0, 0.5, -0.3, 0.8];
let mut output = [0.0; 4];
lpf.reset();
lpf.process(&input, &mut output);

// Chebyshev Type I: steeper rolloff with 1 dB passband ripple
let mut cheb: BiquadCascade<f64, 2> = chebyshev1_lowpass(4, 1.0, 1000.0, 8000.0).unwrap();

Design	Lowpass	Highpass
Butterworth	`butterworth_lowpass`	`butterworth_highpass`
Chebyshev Type I	`chebyshev1_lowpass`	`chebyshev1_highpass`

All filters are no-std compatible, use no complex number arithmetic, and work with both f32 and f64.

PID controller

Discrete-time PID controller with trapezoidal integration, derivative-on-measurement (no derivative kick), optional derivative low-pass filter, and anti-windup via back-calculation:

use numeris::control::Pid;

// PID controller at 1 kHz with output clamping
let mut pid = Pid::new(2.0_f64, 0.5, 0.1, 0.001)
    .with_output_limits(-10.0, 10.0)
    .with_derivative_filter(0.01); // first-order LPF on derivative

// Control loop
let setpoint = 1.0;
let mut measurement = 0.0;
for _ in 0..1000 {
    let u = pid.tick(setpoint, measurement);
    measurement += 0.001 * (-measurement + u); // simple plant
}
assert!((measurement - setpoint).abs() < 0.01);

Complex matrices

Enable the complex feature to use decompositions with complex elements:

[dependencies]
numeris = { version = "0.1", features = ["complex"] }

use numeris::{Complex, Matrix, Vector};

type C = Complex<f64>;

let a = Matrix::new([
    [C::new(2.0, 1.0), C::new(1.0, -1.0)],
    [C::new(1.0, 0.0), C::new(3.0, 2.0)],
]);
let b = Vector::from_array([C::new(5.0, 3.0), C::new(7.0, 4.0)]);
let x = a.solve(&b).unwrap();

// Hermitian positive-definite Cholesky (A = L * L^H)
let hpd = Matrix::new([
    [C::new(4.0, 0.0), C::new(2.0, 1.0)],
    [C::new(2.0, -1.0), C::new(5.0, 0.0)],
]);
let chol = hpd.cholesky().unwrap();

Complex support adds zero overhead to real-valued code paths. The LinalgScalar trait methods (modulus, conj, etc.) are #[inline] identity functions for f32/f64, fully erased by the compiler.

Cargo features

Feature	Default	Description
`std`	yes	Implies `alloc`. Uses hardware FPU via system libm.
`alloc`	via `std`	Enables `DynMatrix` / `DynVector` (heap-allocated, runtime-sized).
`ode`	yes	ODE integration (RK4, adaptive solvers).
`optim`	no	Optimization (root finding, BFGS, Gauss-Newton, LM).
`control`	no	Digital IIR filters (Butterworth, Chebyshev Type I).
`libm`	baseline	Pure-Rust software float math. Always available as fallback.
`complex`	no	Adds `Complex<f32>` / `Complex<f64>` support via `num-complex`.
`all`	no	All features: `std` + `ode` + `optim` + `control` + `complex`.

# Default (std + ode)
cargo build

# All features
cargo build --features all

# No-std for embedded
cargo build --no-default-features --features libm

# No-std with dynamic matrices
cargo build --no-default-features --features "libm,alloc"

# With optimization and complex support
cargo build --features "optim,complex"

Module overview

`matrix` — Fixed-size matrix

Matrix<T, M, N> with [[T; N]; M] row-major storage.

Arithmetic: +, -, * (matrix and scalar), negation, element-wise multiply/divide
Indexing: m[(i, j)], row/column access, block extraction and insertion
Square: trace, det, diag, from_diag, pow, is_symmetric, eye
Norms: frobenius_norm, norm_inf, norm_one
Utilities: transpose, from_fn, map, swap_rows, swap_cols, sum, abs
Iteration: iter(), iter_mut(), as_slice(), IntoIterator

Vectors are type aliases: Vector<T, N> = Matrix<T, 1, N> (row vector), ColumnVector<T, N> = Matrix<T, N, 1>.

Vector-specific: dot, cross, outer, norm, norm_l1, normalize.

Size aliases

Convenience aliases are provided for common sizes (all re-exported from the crate root):

Square	Rectangular (examples)	Vectors
`Matrix1` .. `Matrix6`	`Matrix2x3`, `Matrix3x4`, `Matrix4x6`, ...	`Vector1` .. `Vector6`
	All M×N combinations for M,N ∈ 1..6, M≠N	`ColumnVector1` .. `ColumnVector6`

use numeris::{Matrix3, Matrix4x3, Vector3};

let rotation: Matrix3<f64> = Matrix3::eye();
let points: Matrix4x3<f64> = Matrix4x3::zeros(); // 4 rows, 3 cols
let v: Vector3<f64> = Vector3::from_array([1.0, 2.0, 3.0]);

`dynmatrix` — Dynamic matrix (requires `alloc`)

DynMatrix<T> with Vec<T> row-major storage and runtime dimensions.

Same arithmetic, norms, block ops, and utilities as fixed Matrix
Mixed ops: Matrix * DynMatrix, DynMatrix + Matrix, etc. → DynMatrix
DynVector<T> newtype with single-index access and dot
Conversions: From<Matrix> → DynMatrix, TryFrom<&DynMatrix> → Matrix
Full linalg: DynLu, DynCholesky, DynQr, DynSvd, DynSymmetricEigen, DynSchur wrappers + convenience methods

Type aliases: DynMatrixf64, DynMatrixf32, DynVectorf64, DynVectorf32, DynMatrixi32, DynMatrixi64, DynMatrixu32, DynMatrixu64, DynMatrixz64, DynMatrixz32 (complex).

`linalg` — Decompositions

All decompositions provide both free functions (operating on &mut impl MatrixMut<T>) and wrapper structs with solve(), inverse(), det().

Decomposition	Free function	Fixed struct	Dynamic struct	Notes
LU	`lu_in_place`	`LuDecomposition`	`DynLu`	Partial pivoting
Cholesky	`cholesky_in_place`	`CholeskyDecomposition`	`DynCholesky`	A = LL^H (Hermitian)
QR	`qr_in_place`	`QrDecomposition`	`DynQr`	Householder reflections, least-squares
SVD	`bidiagonalize` + `bidiagonal_qr`	`SvdDecomposition`	`DynSvd`	Rectangular M×N, singular values, rank
Symmetric Eigen	`tridiagonalize` + `tridiagonal_qr_*`	`SymmetricEigen`	`DynSymmetricEigen`	Real eigenvalues, orthogonal eigenvectors
Schur	`hessenberg` + `francis_qr`	`SchurDecomposition`	`DynSchur`	Quasi-upper-triangular, general eigenvalues

Convenience methods on Matrix: a.lu(), a.cholesky(), a.qr(), a.svd(), a.solve(&b), a.inverse(), a.det(), a.eig_symmetric(), a.eigenvalues_symmetric(), a.schur(), a.eigenvalues().

Same convenience methods on DynMatrix: a.lu(), a.cholesky(), a.qr(), a.svd(), a.solve(&b), a.inverse(), a.det(), a.eig_symmetric(), a.eigenvalues_symmetric(), a.schur(), a.eigenvalues().

`ode` — ODE integration

Fixed-step rk4 / rk4_step and 7 adaptive Runge-Kutta solvers via the RKAdaptive trait. PI step-size controller (Söderlind & Wang 2006). Dense output / interpolation available for most solvers (gated behind std). For stiff systems, RODAS4 provides an L-stable Rosenbrock method via the Rosenbrock trait — accepts user-supplied or automatic finite-difference Jacobians.

`optim` — Optimization (requires `optim` feature)

Root finding: brent (bracketed, superlinear convergence), newton_1d (with derivative)
Minimization: minimize_bfgs (BFGS quasi-Newton with Armijo line search)
Least squares: least_squares_gn (Gauss-Newton via QR), least_squares_lm (Levenberg-Marquardt via damped normal equations)
Finite differences: finite_difference_gradient, finite_difference_jacobian for numerical derivatives
All algorithms use FloatScalar bound (real-valued), configurable via settings structs with Default impls for f32 and f64

`control` — Digital filters and controllers (requires `control` feature)

Biquad cascade filters designed via the bilinear transform, and a discrete-time PID controller. No complex feature dependency.

Biquad<T> — single second-order section, Direct Form II Transposed
BiquadCascade<T, N> — cascade of N biquad sections (filter order ≤ 2N)
Design functions: butterworth_lowpass, butterworth_highpass, chebyshev1_lowpass, chebyshev1_highpass
Supports arbitrary even/odd filter orders; odd-order uses degenerate first-order last section
tick, process, process_inplace for sample-by-sample or bulk filtering
Pid<T> — PID controller with derivative filtering, output clamping, anti-windup back-calculation

`quaternion` — Unit quaternion rotations

Quaternion<T> with scalar-first convention [w, x, y, z].

Construction: new, identity, from_axis_angle, from_euler, from_rotation_matrix, rotx, roty, rotz
Operations: * (Hamilton product), * Vector3 (rotation), conjugate, inverse, normalize, slerp
Conversion: to_rotation_matrix, to_axis_angle, to_euler

`traits` — Element traits

Trait	Bounds	Used by
`Scalar`	`Copy + PartialEq + Debug + Zero + One + Num`	All matrix ops
`FloatScalar`	`Scalar + Float + LinalgScalar<Real=Self>`	Quaternions, ordered comparisons
`LinalgScalar`	`Scalar` + `modulus`, `conj`, `re`, `lsqrt`, `lln`, `from_real`	Decompositions, norms
`MatrixRef<T>`	Read-only `get(row, col)`	Generic algorithms
`MatrixMut<T>`	Adds `get_mut(row, col)`	In-place decompositions

Module plan

Checked items are implemented; unchecked are potential future work.

matrix — Fixed-size matrix (stack-allocated, const-generic dimensions), size aliases up to 6×6
linalg — LU, Cholesky, QR, SVD decompositions; symmetric eigendecomposition; real Schur decomposition; solvers, inverse, determinant; complex support
quaternion — Unit quaternion for rotations (SLERP, Euler, axis-angle, rotation matrices)
ode — ODE integration (RK4, 7 adaptive solvers, dense output, RODAS4 stiff solver)
dynmatrix — Heap-allocated runtime-sized matrix/vector (alloc feature)
interp — Interpolation (linear, cubic spline, Hermite)
optim — Optimization (Brent, Newton, BFGS, Gauss-Newton, Levenberg-Marquardt)
quad — Numerical quadrature / integration
fft — Fast Fourier Transform
special — Special functions (Bessel, gamma, erf, etc.)
stats — Statistics and distributions
poly — Polynomial operations and root-finding
control — Digital IIR filters (Butterworth, Chebyshev), PID controllers, state-space systems, discrete-time control (ZOH, Tustin bilinear transform)

Performance

numeris is designed for two use cases: no-std embedded systems and high-performance desktop/server computing.

SIMD acceleration is always-on for f32/f64 — no feature flag needed. On aarch64 (NEON) and x86_64 (SSE2/AVX/AVX-512), matrix multiply, dot products, and element-wise operations use hardware SIMD intrinsics via core::arch. Matrix multiply uses register-blocked micro-kernels (inspired by nano-gemm by Sarah Quinones) that accumulate MR×NR tiles in SIMD registers across the full k-loop, writing C only once — reducing memory traffic by up to O(n) vs. naive implementations. Integer and complex types fall back to scalar loops at zero cost (dead-code eliminated at monomorphization via TypeId dispatch).

SSE2 and NEON are always-on baselines. AVX and AVX-512 are compile-time opt-in via -C target-cpu=native; dispatch selects the widest available ISA.

Architecture	ISA	f64 tile (MR×NR)	f32 tile (MR×NR)
aarch64	NEON (128-bit)	4×4	8×4
x86_64	SSE2 (128-bit)	4×4	8×4
x86_64	AVX (256-bit)	8×4	16×4
x86_64	AVX-512 (512-bit)	16×4	32×4
other	scalar fallback	4×4	4×4

Design decisions

Stack-allocated: [[T; N]; M] storage, no heap. Dimensions are const generics.
Heap-allocated: DynMatrix uses Vec<T> for runtime dimensions, behind alloc feature.
num-traits: Generic numeric bounds with default-features = false.
In-place algorithms: Decompositions operate on &mut impl MatrixMut<T>, avoiding allocator/storage trait complexity. Both Matrix and DynMatrix implement MatrixMut, so the same free functions work for both.
Integer matrices: Work with Scalar (all basic ops). Float-only operations (det, norms, decompositions) require LinalgScalar or FloatScalar.
Complex support: Additive, behind a feature flag. Zero cost for real-only usage.

Acknowledgments

The register-blocked SIMD matrix multiply micro-kernels are inspired by nano-gemm and faer by Sarah Quinones.

License

MIT

numeris 0.2.2