numeris

Pure-Rust numerical algorithms library, no-std compatible. Similar in scope to SciPy, suitable for embedded targets (no heap allocation, no FPU assumptions) while being highly performant on desktop/server hardware via SIMD intrinsics.

Documentation & examples

[!NOTE] Alpha software — APIs are unstable and may change without notice.

Features

Fixed-size matrices — stack-allocated, const-generic Matrix<T, M, N>
Dynamic matrices — heap-allocated DynMatrix<T> with runtime dimensions (optional alloc feature)
nalgebra interop — zero-cost conversions between numeris and nalgebra types (optional nalgebra feature)
Linear algebra — LU, Cholesky, QR, SVD decompositions; symmetric eigendecomposition; real Schur decomposition
ODE integration — fixed-step RK4, 7 adaptive Runge-Kutta solvers with dense output, RODAS4 stiff solver; supports both vector and matrix state
Optimization — root finding (Brent, Newton), BFGS minimization, Gauss-Newton and Levenberg-Marquardt least squares
Complex number support — all decompositions work with Complex<f32> / Complex<f64> (optional feature)
State estimation — EKF, UKF, Square-Root UKF, Cubature Kalman Filter, RTS smoother, batch least-squares
Interpolation — linear, Hermite, barycentric Lagrange, natural cubic spline, bilinear (2D)
Image processing — 2D convolution, Gaussian/box blur, Sobel/Scharr gradients, median/rank filters, morphology (Van Herk), integral image, Otsu/adaptive thresholding, Canny, Harris/Shi-Tomasi corners, Gaussian pyramid
Special functions — gamma, lgamma, digamma, beta, incomplete gamma/beta, erf/erfc
Statistical distributions — Normal, Uniform, Exponential, Gamma, Beta, Chi-squared, Student's t, Bernoulli, Binomial, Poisson
Quaternions — unit quaternion rotations, SLERP, Euler angles, rotation matrices
Norms — L1, L2, Frobenius, infinity, one norms
SIMD acceleration — NEON (aarch64), SSE2/AVX/AVX-512 (x86_64) intrinsics for matmul, dot products, and element-wise ops; zero-cost scalar fallback for integers and unsupported architectures
No-std / embedded — runs without std or heap; float math falls back to software libm

Quick start

[dependencies]
numeris = "0.5"

use numeris::{Matrix, Vector};

// Solve a linear system Ax = b
let a = Matrix::new([
    [2.0_f64, 1.0, -1.0],
    [-3.0, -1.0, 2.0],
    [-2.0, 1.0, 2.0],
]);
let b = Vector::from_array([8.0, -11.0, -3.0]);
let x = a.solve(&b).unwrap(); // x = [2, 3, -1]

// Cholesky decomposition of a symmetric positive-definite matrix
let spd = Matrix::new([[4.0, 2.0], [2.0, 3.0]]);
let chol = spd.cholesky().unwrap();
let inv = chol.inverse();

// QR decomposition and least-squares
let a = Matrix::new([
    [1.0_f64, 0.0],
    [1.0, 1.0],
    [1.0, 2.0],
]);
let b = Vector::from_array([1.0, 2.0, 4.0]);
let x = a.qr().unwrap().solve(&b); // least-squares fit

// Symmetric eigendecomposition
let sym = Matrix::new([[4.0_f64, 1.0], [1.0, 3.0]]);
let eig = sym.eig_symmetric().unwrap();
let eigenvalues = eig.eigenvalues();   // sorted ascending
let eigenvectors = eig.eigenvectors(); // columns = eigenvectors

// General eigenvalues via Schur decomposition
let a = Matrix::new([[0.0_f64, -1.0], [1.0, 0.0]]); // 90° rotation
let (re, im) = a.eigenvalues().unwrap(); // eigenvalues ±i

// Quaternion rotation
use numeris::Quaternion;
let q = Quaternion::from_axis_angle(
    &Vector::from_array([0.0, 0.0, 1.0]),
    std::f64::consts::FRAC_PI_2,
);
let v = Vector::from_array([1.0, 0.0, 0.0]);
let rotated = q * v; // [0, 1, 0]

No-std / embedded

numeris works on bare-metal targets with no allocator:

[dependencies]
numeris = { version = "0.5", default-features = false, features = ["libm"] }

#![no_std]
use numeris::{Matrix, Vector};

// All fixed-size operations work without std or alloc
let a = Matrix::new([[2.0_f32, 1.0], [5.0, 3.0]]);
let b = Vector::from_array([4.0_f32, 7.0]);
let x = a.solve(&b).unwrap();

// Eigendecomposition on a microcontroller
let sym = Matrix::new([[3.0_f32, 1.0], [1.0, 2.0]]);
let eig = sym.eig_symmetric().unwrap();

Add alloc for DynMatrix on targets with a heap but no OS:

numeris = { version = "0.5", default-features = false, features = ["libm", "alloc"] }

Dynamic matrices

When dimensions aren't known at compile time, use DynMatrix (requires alloc, included with default std feature):

use numeris::{DynMatrix, DynVector};

// Runtime-sized matrix
let a = DynMatrix::from_rows(3, 3, &[
    2.0_f64, 1.0, -1.0,
    -3.0, -1.0, 2.0,
    -2.0, 1.0, 2.0,
]);
let b = DynVector::from_slice(&[8.0, -11.0, -3.0]);
let x = a.solve(&b).unwrap();

// Convert between fixed and dynamic
use numeris::Matrix;
let fixed = Matrix::new([[1.0, 2.0], [3.0, 4.0]]);
let dynamic: DynMatrix<f64> = fixed.into();
let back: Matrix<f64, 2, 2> = (&dynamic).try_into().unwrap();

// Mixed arithmetic
let result = fixed * &dynamic; // Matrix * DynMatrix → DynMatrix

Type aliases: DynMatrixf64, DynMatrixf32, DynVectorf64, DynVectorf32, DynMatrixz64 (complex, requires complex feature), etc.

ODE integration

Fixed-step RK4 and 7 adaptive Runge-Kutta solvers with embedded error estimation and dense output:

use numeris::ode::{RKAdaptive, RKTS54, AdaptiveSettings};
use numeris::Vector;

// Harmonic oscillator: y'' = -y
let y0 = Vector::from_array([1.0_f64, 0.0]);
let tau = 2.0 * std::f64::consts::PI;
let sol = RKTS54::integrate(
    0.0, tau, &y0,
    |_t, y| Vector::from_array([y[1], -y[0]]),
    &AdaptiveSettings::default(),
).unwrap();
assert!((sol.y[0] - 1.0).abs() < 1e-6); // cos(2π) ≈ 1

Explicit solvers

Solver	Stages	Order	FSAL	Interpolant
`RKF45`	6	5(4)	no	—
`RKTS54`	7	5(4)	yes	4th degree
`RKV65`	10	6(5)	no	6th degree
`RKV87`	13+4	8(7)	no	7th degree
`RKV98`	16+5	9(8)	no	8th degree
`RKV98NoInterp`	16	9(8)	no	—
`RKV98Efficient`	16+10	9(8)	no	9th degree

Stiff solvers (Rosenbrock)

For stiff ODEs (chemical kinetics, circuit simulation, orbital mechanics with drag), use RODAS4 — a linearly-implicit Rosenbrock method that solves linear systems involving the Jacobian instead of nonlinear Newton iterations:

use numeris::ode::{Rosenbrock, RODAS4, AdaptiveSettings};
use numeris::{Vector, Matrix};

// Stiff decay: y' = -1000*y, y(0) = 1
let y0 = Vector::from_array([1.0_f64]);
let sol = RODAS4::integrate(
    0.0, 0.01, &y0,
    |_t, y| Vector::from_array([-1000.0 * y[0]]),
    |_t, _y| Matrix::new([[-1000.0]]),  // Jacobian ∂f/∂y
    &AdaptiveSettings::default(),
).unwrap();
// Or use integrate_auto for automatic finite-difference Jacobian

Solver	Stages	Order	L-stable
`RODAS4`	6	4(3)	yes

Optimization

Root finding, unconstrained minimization, and nonlinear least squares (requires optim feature):

[dependencies]
numeris = { version = "0.5", features = ["optim"] }

use numeris::optim::{brent, minimize_bfgs, least_squares_lm, RootSettings, BfgsSettings, LmSettings};
use numeris::{Matrix, Vector};

// Root finding: solve x² - 2 = 0
let root = brent(|x| x * x - 2.0, 0.0, 2.0, &RootSettings::default()).unwrap();
assert!((root.x - std::f64::consts::SQRT_2).abs() < 1e-12);

// BFGS minimization: minimize (x-1)² + (y-2)²
let min = minimize_bfgs(
    |x: &Vector<f64, 2>| (x[0] - 1.0).powi(2) + (x[1] - 2.0).powi(2),
    |x: &Vector<f64, 2>| Vector::from_array([2.0 * (x[0] - 1.0), 2.0 * (x[1] - 2.0)]),
    &Vector::from_array([0.0, 0.0]),
    &BfgsSettings::default(),
).unwrap();
assert!((min.x[0] - 1.0).abs() < 1e-6);

// Levenberg-Marquardt: fit y = a * exp(b * x)
let t = [0.0_f64, 1.0, 2.0, 3.0, 4.0];
let y = [2.0, 2.7, 3.65, 4.95, 6.7];
let fit = least_squares_lm(
    |x: &Vector<f64, 2>| {
        let mut r = Vector::<f64, 5>::zeros();
        for i in 0..5 { r[i] = x[0] * (x[1] * t[i]).exp() - y[i]; }
        r
    },
    |x: &Vector<f64, 2>| {
        let mut j = Matrix::<f64, 5, 2>::zeros();
        for i in 0..5 {
            let e = (x[1] * t[i]).exp();
            j[(i, 0)] = e;
            j[(i, 1)] = x[0] * t[i] * e;
        }
        j
    },
    &Vector::from_array([1.0, 0.1]),
    &LmSettings::default(),
).unwrap();
assert!(fit.cost < 0.1);

Algorithm	Function	Use case
Brent's method	`brent`	Bracketed scalar root finding
Newton's method	`newton_1d`	Scalar root finding with derivative
BFGS	`minimize_bfgs`	Unconstrained minimization
Gauss-Newton	`least_squares_gn`	Nonlinear least squares (QR-based)
Levenberg-Marquardt	`least_squares_lm`	Nonlinear least squares (damped)

Finite-difference utilities: finite_difference_gradient and finite_difference_jacobian for when analytical derivatives aren't available.

Digital IIR filters

Biquad cascade IIR filters with Butterworth and Chebyshev Type I design (requires control feature):

[dependencies]
numeris = { version = "0.5", features = ["control"] }

use numeris::control::{butterworth_lowpass, chebyshev1_lowpass, BiquadCascade};

// 4th-order Butterworth lowpass at 1 kHz, 8 kHz sample rate (N=2 biquad sections)
let mut lpf: BiquadCascade<f64, 2> = butterworth_lowpass(4, 1000.0, 8000.0).unwrap();
let y = lpf.tick(1.0); // filter one sample

// Bulk processing
let input = [1.0, 0.5, -0.3, 0.8];
let mut output = [0.0; 4];
lpf.reset();
lpf.process(&input, &mut output);

// Chebyshev Type I: steeper rolloff with 1 dB passband ripple
let mut cheb: BiquadCascade<f64, 2> = chebyshev1_lowpass(4, 1.0, 1000.0, 8000.0).unwrap();

Design	Lowpass	Highpass
Butterworth	`butterworth_lowpass`	`butterworth_highpass`
Chebyshev Type I	`chebyshev1_lowpass`	`chebyshev1_highpass`

All filters are no-std compatible, use no complex number arithmetic, and work with both f32 and f64.

PID controller

Discrete-time PID controller with trapezoidal integration, derivative-on-measurement (no derivative kick), optional derivative low-pass filter, and anti-windup via back-calculation:

use numeris::control::Pid;

// PID controller at 1 kHz with output clamping
let mut pid = Pid::new(2.0_f64, 0.5, 0.1, 0.001)
    .with_output_limits(-10.0, 10.0)
    .with_derivative_filter(0.01); // first-order LPF on derivative

// Control loop
let setpoint = 1.0;
let mut measurement = 0.0;
for _ in 0..1000 {
    let u = pid.tick(setpoint, measurement);
    measurement += 0.001 * (-measurement + u); // simple plant
}
assert!((measurement - setpoint).abs() < 0.01);

State estimation

Six estimators for nonlinear state estimation and offline batch processing (requires estimate feature):

[dependencies]
numeris = { version = "0.5", features = ["estimate"] }

use numeris::estimate::{Ekf, Ukf, SrUkf, Ckf, BatchLsq};
use numeris::{Vector, Matrix};

// EKF: 2-state constant-velocity, 1 measurement (position)
let x0 = Vector::from_array([0.0_f64, 1.0]); // [pos, vel]
let p0 = Matrix::new([[1.0, 0.0], [0.0, 1.0]]);
let mut ekf = Ekf::<f64, 2, 1>::new(x0, p0);

let dt = 0.1;
let q = Matrix::new([[0.01, 0.0], [0.0, 0.01]]);
let r = Matrix::new([[0.5]]);

// Predict with dynamics model + Jacobian
ekf.predict(
    |x| Vector::from_array([x[0] + dt * x[1], x[1]]),
    |_x| Matrix::new([[1.0, dt], [0.0, 1.0]]),
    Some(&q),
);

// Update with position measurement
ekf.update(
    &Vector::from_array([0.12]),
    |x| Vector::from_array([x[0]]),
    |_x| Matrix::new([[1.0, 0.0]]),
    &r,
).unwrap();

// UKF: no Jacobians needed — uses sigma points
let mut ukf = Ukf::<f64, 2, 1>::new(x0, p0);
ukf.predict(
    |x| Vector::from_array([x[0] + dt * x[1], x[1]]),
    Some(&q),
).unwrap();
ukf.update(
    &Vector::from_array([0.12]),
    |x| Vector::from_array([x[0]]),
    &r,
).unwrap();

// CKF: tuning-free sigma-point filter (2N points, equal weights)
let mut ckf = Ckf::<f64, 2, 1>::new(x0, p0);
ckf.predict(
    |x| Vector::from_array([x[0] + dt * x[1], x[1]]),
    Some(&q),
).unwrap();

// Batch least-squares: accumulate linear observations
let mut lsq = BatchLsq::<f64, 1>::new();
let h = Matrix::new([[1.0_f64]]);
let r_lsq = Matrix::new([[0.1]]);
lsq.add_observation(&Vector::from_array([1.05]), &h, &r_lsq).unwrap();
lsq.add_observation(&Vector::from_array([0.95]), &h, &r_lsq).unwrap();
let (x_est, _p_est) = lsq.solve().unwrap();

Filter	Struct	Jacobians	No-std
Extended Kalman	`Ekf<T, N, M>`	User-supplied or finite-difference	Yes
Unscented Kalman	`Ukf<T, N, M>`	Not needed (sigma points)	No (`alloc`)
Square-Root UKF	`SrUkf<T, N, M>`	Not needed (propagates Cholesky factor)	No (`alloc`)
Cubature Kalman	`Ckf<T, N, M>`	Not needed (2N cubature points)	No (`alloc`)
RTS Smoother	`rts_smooth()`	Stored from EKF forward pass	No (`alloc`)
Batch Least-Squares	`BatchLsq<T, N>`	Linear H matrix	Yes

All support f32 and f64. Process noise Q is optional (None for zero noise).

Complex matrices

Enable the complex feature to use decompositions with complex elements:

[dependencies]
numeris = { version = "0.5", features = ["complex"] }

use numeris::{Complex, Matrix, Vector};

type C = Complex<f64>;

let a = Matrix::new([
    [C::new(2.0, 1.0), C::new(1.0, -1.0)],
    [C::new(1.0, 0.0), C::new(3.0, 2.0)],
]);
let b = Vector::from_array([C::new(5.0, 3.0), C::new(7.0, 4.0)]);
let x = a.solve(&b).unwrap();

// Hermitian positive-definite Cholesky (A = L * L^H)
let hpd = Matrix::new([
    [C::new(4.0, 0.0), C::new(2.0, 1.0)],
    [C::new(2.0, -1.0), C::new(5.0, 0.0)],
]);
let chol = hpd.cholesky().unwrap();

Complex support adds zero overhead to real-valued code paths. The LinalgScalar trait methods (modulus, conj, etc.) are #[inline] identity functions for f32/f64, fully erased by the compiler.

Cargo features

Feature	Default	Description
`std`	yes	Implies `alloc`. Uses hardware FPU via system libm.
`alloc`	via `std`	Enables `DynMatrix` / `DynVector` (heap-allocated, runtime-sized).
`ode`	no	ODE integration (RK4, adaptive solvers).
`optim`	no	Optimization (root finding, BFGS, Gauss-Newton, LM).
`control`	no	Digital IIR filters, PID controller, lead/lag compensators, PID tuning.
`estimate`	no	State estimation (EKF, UKF, SR-UKF, CKF, RTS, batch LSQ). Implies `alloc`.
`interp`	no	Interpolation (linear, Hermite, barycentric Lagrange, cubic spline, bilinear).
`imageproc`	no	2D image processing on `DynMatrix` (filters, morphology, integral image, Canny, corners, geometric). Implies `alloc`.
`special`	no	Special functions (gamma, beta, erf, incomplete gamma/beta, digamma).
`stats`	no	Statistical distributions (Normal, Gamma, Beta, etc.). Implies `special`.
`libm`	baseline	Pure-Rust software float math. Always available as fallback.
`complex`	no	Adds `Complex<f32>` / `Complex<f64>` support via `num-complex`.
`nalgebra`	no	Conversions between numeris and nalgebra types (`From`/`Into`, `MatrixRef`/`MatrixMut`).
`serde`	no	Serialize/deserialize all types via serde. Row-major JSON format.
`all`	no	All features.

# Default (std only — matrix, linalg, quaternion)
cargo build

# All features
cargo build --features all

# No-std for embedded
cargo build --no-default-features --features libm

# No-std with dynamic matrices
cargo build --no-default-features --features "libm,alloc"

# With optimization and complex support
cargo build --features "optim,complex"

Benchmarks

Measured on Apple Silicon (aarch64, NEON) with Criterion. Compared against nalgebra and faer. Run with cd bench && cargo bench.

Benchmark	numeris	nalgebra	faer
matmul 4x4	6.7 ns	4.9 ns	56 ns
matmul 6x6	20.5 ns	19.8 ns	86 ns
matmul 50x50 (dyn)	5.69 us	6.64 us	6.10 us
matmul 200x200 (dyn)	306 us	293 us	179 us
dot 100 (dyn)	11.8 ns	12.3 ns	—
LU 4x4	32.7 ns	28.2 ns	206 ns
LU 6x6	83.6 ns	82.0 ns	303 ns
QR 4x4	46.0 ns	90.6 ns	299 ns
QR 6x6	83.4 ns	208 ns	439 ns
SVD 4x4	449 ns	459 ns	1.26 us
SVD 6x6	607 ns	915 ns	1.85 us
Cholesky 4x4	25.1 ns	11.5 ns	130 ns
Eigen sym 4x4	178 ns	202 ns	607 ns
Eigen sym 6x6	303 ns	535 ns	1.08 us

numeris is competitive with nalgebra at small fixed sizes and significantly faster for QR (2x), SVD (1.5x), and eigendecomposition (1.8x). faer excels at large dynamic matrices (200x200+) due to cache-aware A+B panel packing.

Module overview

Matrix<T, M, N> with [[T; M]; N] column-major storage. Matrix::new() accepts row-major input and transposes internally.

Arithmetic: +, -, * (matrix and scalar), negation, element-wise multiply/divide
Indexing: m[(i, j)], row/column access, block extraction and insertion
Square: trace, det, diag, from_diag, pow, is_symmetric, eye
Norms: frobenius_norm, norm_inf, norm_one
Utilities: transpose, from_fn, map, swap_rows, swap_cols, sum, abs
Iteration: iter(), iter_mut(), as_slice(), IntoIterator

Vector<T, N> = Matrix<T, N, 1> — a column vector, matching nalgebra's convention. Single-index access: v[i].

Vector-specific: dot, cross, outer, norm, norm_l1, normalize.

Size aliases

Convenience aliases are provided for common sizes (all re-exported from the crate root):

Square	Rectangular (examples)	Vectors
`Matrix1` .. `Matrix6`	`Matrix2x3`, `Matrix3x4`, `Matrix4x6`, ...	`Vector1` .. `Vector6`
	All M×N combinations for M,N ∈ 1..6, M≠N

use numeris::{Matrix3, Matrix4x3, Vector3};

let rotation: Matrix3<f64> = Matrix3::eye();
let points: Matrix4x3<f64> = Matrix4x3::zeros(); // 4 rows, 3 cols
let v: Vector3<f64> = Vector3::from_array([1.0, 2.0, 3.0]);

DynMatrix<T> with Vec<T> column-major storage and runtime dimensions. from_rows() accepts row-major data (transposes internally); from_slice() accepts column-major data directly.

Same arithmetic, norms, block ops, and utilities as fixed Matrix
Mixed ops: Matrix * DynMatrix, DynMatrix + Matrix, etc. → DynMatrix
DynVector<T> newtype with single-index access and dot
Conversions: From<Matrix> → DynMatrix, TryFrom<&DynMatrix> → Matrix
Full linalg: DynLu, DynCholesky, DynQr, DynSvd, DynSymmetricEigen, DynSchur wrappers + convenience methods

Type aliases: DynMatrixf64, DynMatrixf32, DynVectorf64, DynVectorf32, DynMatrixi32, DynMatrixi64, DynMatrixu32, DynMatrixu64, DynMatrixz64, DynMatrixz32 (complex).

All decompositions provide both free functions (operating on &mut impl MatrixMut<T>) and wrapper structs with solve(), inverse(), det().

Decomposition	Free function	Fixed struct	Dynamic struct	Notes
LU	`lu_in_place`	`LuDecomposition`	`DynLu`	Partial pivoting
Cholesky	`cholesky_in_place`	`CholeskyDecomposition`	`DynCholesky`	A = LL^H (Hermitian)
QR	`qr_in_place`	`QrDecomposition`	`DynQr`	Householder reflections, least-squares
SVD	`bidiagonalize` + `bidiagonal_qr`	`SvdDecomposition`	`DynSvd`	Rectangular M×N, singular values, rank
Symmetric Eigen	`tridiagonalize` + `tridiagonal_qr_*`	`SymmetricEigen`	`DynSymmetricEigen`	Real eigenvalues, orthogonal eigenvectors
Schur	`hessenberg` + `francis_qr`	`SchurDecomposition`	`DynSchur`	Quasi-upper-triangular, general eigenvalues

Convenience methods on Matrix: a.lu(), a.cholesky(), a.qr(), a.svd(), a.solve(&b), a.inverse(), a.det(), a.eig_symmetric(), a.eigenvalues_symmetric(), a.schur(), a.eigenvalues().

Same convenience methods on DynMatrix: a.lu(), a.cholesky(), a.qr(), a.svd(), a.solve(&b), a.inverse(), a.det(), a.eig_symmetric(), a.eigenvalues_symmetric(), a.schur(), a.eigenvalues().

Fixed-step rk4 / rk4_step and 7 adaptive Runge-Kutta solvers via the RKAdaptive trait. PI step-size controller (Söderlind & Wang 2006). Dense output / interpolation available for most solvers (gated behind std). State can be a vector (Vector<T, N>) or a matrix (Matrix<T, M, N>) — enabling matrix ODE integration (e.g., state transition matrices, matrix Riccati equations). For stiff systems, RODAS4 provides an L-stable Rosenbrock method via the Rosenbrock trait — accepts user-supplied or automatic finite-difference Jacobians (vector state only).

Root finding: brent (bracketed, superlinear convergence), newton_1d (with derivative)
Minimization: minimize_bfgs (BFGS quasi-Newton with Armijo line search)
Least squares: least_squares_gn (Gauss-Newton via QR), least_squares_lm (Levenberg-Marquardt via damped normal equations)
Finite differences: finite_difference_gradient, finite_difference_jacobian for numerical derivatives
All algorithms use FloatScalar bound (real-valued), configurable via settings structs with Default impls for f32 and f64

Six estimators with const-generic state (N) and measurement (M) dimensions. Closure-based dynamics and measurement models.

Ekf — Extended Kalman Filter with predict/update (explicit Jacobian) and predict_fd/update_fd (finite-difference). Joseph-form covariance update. Fully no-std.
Ukf — Unscented Kalman Filter with Merwe-scaled sigma points (alpha, beta, kappa). Requires alloc.
SrUkf — Square-Root UKF. Propagates Cholesky factor S (where P = SSᵀ) for guaranteed positive-definiteness. Requires alloc.
Ckf — Cubature Kalman Filter. Third-degree spherical-radial cubature rule with 2N equally-weighted points. No tuning parameters. Requires alloc.
rts_smooth — Rauch–Tung–Striebel fixed-interval smoother. Takes a forward EKF pass (EkfStep records) and returns smoothed estimates. Requires alloc.
BatchLsq — Linear batch least-squares via information accumulation (Λ = Σ HᵀR⁻¹H). Supports mixed measurement dimensions via method-level const generic. Fully no-std.
Process noise Q is optional in predict step (Some(&q) or None)
FloatScalar bound (real-valued), works with f32 and f64

Biquad cascade filters designed via the bilinear transform, PID controller, lead/lag compensator design, and PID tuning rules. No complex feature dependency.

Biquad<T> — single second-order section, Direct Form II Transposed
BiquadCascade<T, N> — cascade of N biquad sections (filter order ≤ 2N)
Design functions: butterworth_lowpass, butterworth_highpass, chebyshev1_lowpass, chebyshev1_highpass
Supports arbitrary even/odd filter orders; odd-order uses degenerate first-order last section
tick, process, process_inplace for sample-by-sample or bulk filtering
Pid<T> — PID controller with derivative filtering, output clamping, anti-windup back-calculation
lead_compensator / lag_compensator — continuous-to-discrete compensator design via bilinear transform
FopdtModel — FOPDT process model with ziegler_nichols, cohen_coon, simc tuning methods
ziegler_nichols_ultimate — closed-loop ultimate gain PID tuning

Fixed-size (const N, stack-allocated, no-std) and dynamic variants (Dyn*, requires alloc). Out-of-bounds evaluations extrapolate.

Method	Fixed	Dynamic	Notes
Linear	`LinearInterp<T, N>`	`DynLinearInterp<T>`	Piecewise linear
Hermite	`HermiteInterp<T, N>`	`DynHermiteInterp<T>`	User-supplied derivatives
Lagrange	`LagrangeInterp<T, N>`	`DynLagrangeInterp<T>`	Barycentric, O(N) eval
Cubic spline	`CubicSpline<T, N>`	`DynCubicSpline<T>`	Natural BCs, Thomas algorithm
Bilinear	`BilinearInterp<T, NX, NY>`	`DynBilinearInterp<T>`	2D rectangular grid

Operates on DynMatrix<T> buffers (column-major, BorderMode-aware). Convolution inner loops dispatch through the SIMD dot/AXPY kernels on contiguous column slices. Implies alloc.

Filtering & convolution: convolve2d (any MatrixRef kernel), convolve2d_separable, gaussian_blur, box_blur, unsharp_mask, laplacian, laplacian_of_gaussian, sobel_gradients, scharr_gradients, gradient_magnitude
Order-statistic filters: rank_filter, percentile_filter, median_filter (quickselect; 3×3/5×5 stack fast paths), median_filter_u16 (Huang sliding histogram), median_pool, median_pool_upsampled
Morphology (Van Herk - Gil-Werman, ~3 compares/pixel): max_filter, min_filter, dilate, erode, opening, closing, morphology_gradient, top_hat, black_hat
Local statistics via integral image (O(1) per pixel): integral_image, integral_rect_sum, local_mean, local_variance, local_stddev
Multi-scale: difference_of_gaussians, gaussian_pyramid
Thresholding: threshold, threshold_otsu, adaptive_threshold
Edges & corners: canny (Gaussian → Sobel → NMS → hysteresis), harris_corners, shi_tomasi_corners
Geometric: flip_horizontal, flip_vertical, rotate_90 / 180 / 270, pad, crop, resize_nearest, resize_bilinear
BorderMode<T>: Zero / Constant / Replicate / Reflect, works with integer images (e.g. u16)

Fully no-std, generic over FloatScalar (f32/f64).

Function	Description
`gamma(x)`, `lgamma(x)`	Gamma function and log-gamma (Lanczos approximation)
`digamma(x)`	Digamma / psi function (recurrence + asymptotic series)
`beta(a,b)`, `lbeta(a,b)`	Beta function and log-beta
`gamma_inc(a,x)`	Regularized lower incomplete gamma P(a,x)
`gamma_inc_upper(a,x)`	Regularized upper incomplete gamma Q(a,x)
`betainc(a,b,x)`	Regularized incomplete beta I_x(a,b)
`erf(x)`, `erfc(x)`	Error function and complementary error function

Continuous distributions implement ContinuousDistribution (pdf, cdf, quantile, mean, variance). Discrete distributions implement DiscreteDistribution (pmf, cdf, quantile, mean, variance). Implies special feature.

Distribution	Struct	Parameters
Normal	`Normal<T>`	mean μ, std dev σ
Uniform	`Uniform<T>`	bounds [a, b]
Exponential	`Exponential<T>`	rate λ
Gamma	`Gamma<T>`	shape α, rate β
Beta	`Beta<T>`	shape α, shape β
Chi-squared	`ChiSquared<T>`	degrees of freedom k
Student's t	`StudentT<T>`	degrees of freedom ν
Bernoulli	`Bernoulli<T>`	probability p
Binomial	`Binomial<T>`	trials n, probability p
Poisson	`Poisson<T>`	rate λ

Quaternion<T> with scalar-first convention [w, x, y, z].

Construction: new, identity, from_axis_angle, from_euler, from_rotation_matrix, rotx, roty, rotz
Operations: * (Hamilton product), * Vector3 (rotation), conjugate, inverse, normalize, slerp
Conversion: to_rotation_matrix, to_axis_angle, to_euler

Trait	Bounds	Used by
`Scalar`	`Copy + PartialEq + Debug + Zero + One + Num`	All matrix ops
`FloatScalar`	`Scalar + Float + LinalgScalar<Real=Self>`	Quaternions, ordered comparisons
`LinalgScalar`	`Scalar` + `modulus`, `conj`, `re`, `lsqrt`, `lln`, `from_real`	Decompositions, norms
`MatrixRef<T>`	Read-only `get(row, col)`	Generic algorithms
`MatrixMut<T>`	Adds `get_mut(row, col)`	In-place decompositions

Module plan

Checked items are implemented; unchecked are potential future work.

matrix — Fixed-size matrix (stack-allocated, const-generic dimensions), size aliases up to 6×6
linalg — LU, Cholesky, QR, SVD decompositions; symmetric eigendecomposition; real Schur decomposition; solvers, inverse, determinant; complex support
quaternion — Unit quaternion for rotations (SLERP, Euler, axis-angle, rotation matrices)
ode — ODE integration (RK4, 7 adaptive solvers, dense output, RODAS4 stiff solver)
dynmatrix — Heap-allocated runtime-sized matrix/vector (alloc feature)
interp — Interpolation (linear, Hermite, barycentric Lagrange, natural cubic spline)
imageproc — 2D image processing (convolution, filters, morphology, integral image, thresholding, Canny, corners, geometric)
optim — Optimization (Brent, Newton, BFGS, Gauss-Newton, Levenberg-Marquardt)
estimate — State estimation: EKF, UKF, SR-UKF, CKF, RTS smoother, batch least-squares
quad — Numerical quadrature / integration
fft — Fast Fourier Transform
special — Special functions (gamma, lgamma, digamma, beta, lbeta, incomplete gamma/beta, erf, erfc)
stats — Statistical distributions (Normal, Uniform, Exponential, Gamma, Beta, Chi-squared, Student's t, Bernoulli, Binomial, Poisson)
poly — Polynomial operations and root-finding
control — Digital IIR filters (Butterworth, Chebyshev), PID controller, lead/lag compensators, PID tuning (Ziegler-Nichols, Cohen-Coon, SIMC)

Performance

numeris is designed for two use cases: no-std embedded systems and high-performance desktop/server computing.

SIMD acceleration is always-on for f32/f64 — no feature flag needed. On aarch64 (NEON) and x86_64 (SSE2/AVX/AVX-512), matrix multiply, dot products, and element-wise operations use hardware SIMD intrinsics via core::arch. Matrix multiply uses register-blocked micro-kernels (inspired by nano-gemm by Sarah Quinones) that accumulate MR×NR tiles in SIMD registers with k-blocking (KC=256) for cache locality, writing C only once per tile — reducing memory traffic by up to O(n) vs. naive implementations. Small matrices (4x4 and below) use direct formulas for inverse and determinant, bypassing LU decomposition entirely. Integer and complex types fall back to scalar loops at zero cost (dead-code eliminated at monomorphization via TypeId dispatch).

SSE2 and NEON are always-on baselines. AVX and AVX-512 are compile-time opt-in via -C target-cpu=native; dispatch selects the widest available ISA.

Architecture	ISA	f64 tile (MR×NR)	f32 tile (MR×NR)
aarch64	NEON (128-bit)	8×4	8×4
x86_64	SSE2 (128-bit)	4×4	8×4
x86_64	AVX (256-bit)	8×4	16×4
x86_64	AVX-512 (512-bit)	16×4	32×4
other	scalar fallback	4×4	4×4

Design decisions

Column-major storage: [[T; M]; N] (N columns of M rows), matching LAPACK conventions. Stack-allocated, const generics.
Heap-allocated: DynMatrix uses Vec<T> for runtime dimensions, behind alloc feature.
num-traits: Generic numeric bounds with default-features = false.
In-place algorithms: Decompositions operate on &mut impl MatrixMut<T>, avoiding allocator/storage trait complexity. Both Matrix and DynMatrix implement MatrixMut, so the same free functions work for both.
Integer matrices: Work with Scalar (all basic ops). Float-only operations (det, norms, decompositions) require LinalgScalar or FloatScalar.
Complex support: Additive, behind a feature flag. Zero cost for real-only usage.

Acknowledgments

The register-blocked SIMD matrix multiply micro-kernels are inspired by nano-gemm and faer by Sarah Quinones.

License

MIT

numeris 0.5.9