Numina - Backend-Agnostic Array Library for Rust

A safe, efficient array library with ndarray-compatible operations, designed as the foundation for high-performance computing backends in Rust.

Features

Safe & Ergonomic: Memory-safe array operations with Rust's guarantees
Type Safe: Runtime shape and data type validation
Backend Agnostic: NdArray trait enables multiple backends (CPU, GPU, remote)
Extensible Types: Support for custom data types (BFloat16, quantized types)
Zero Dependencies: Pure Rust implementation

Quick Start

use numina::{Array, Shape, add, matmul, sum, F32};

// Create arrays
let a = Array::from_slice(&[1.0f32, 2.0, 3.0, 4.0], Shape::from([2, 2]))?;
let b = Array::from_slice(&[5.0f32, 6.0, 7.0, 8.0], Shape::from([2, 2]))?;

// Operations work on any NdArray backend
let c = add(&a, &b)?;           // Element-wise addition
let d = matmul(&a, &b)?;        // Matrix multiplication
let total = sum(&a, None)?;     // Sum all elements
let row_sums = sum(&a, Some(1))?; // Sum along axis

Core Types

Array<T>: Typed N-dimensional arrays for CPU operations
CpuBytesArray: Byte-based N-dimensional arrays for CPU operations
NdArray: Backend-agnostic trait for all array operations
Shape: Multi-dimensional array dimensions
DType: Data types (f32, f64, i8-i64, u8-u64, bool, custom types)

Design Philosophy: Numina provides the low-level backend infrastructure. High-level tensor APIs (like Tensor types) are provided by dependent crates (for example, laminax-types) which build upon Numina's NdArray trait.

DType ID Table

Stable dtype IDs are serialized in Lamina IR and Laminax runtime. IDs are explicit and frozen.

FP8 formats follow the E4M3FN (finite-only, max 448) and E5M2 (Inf/NaN, max 57344) conventions with round-to-nearest-even (RTNE) rounding.

Name	DType	ID	Bytes	Storage Bits	Align
float16	`F16`	1	2	16	2
float32	`F32`	2	4	32	4
float64	`F64`	3	8	64	8
bfloat16	`BF16`	4	2	16	2
bfloat8	`BF8`	5	1	8	1
float8_e4m3fn	`F8E4M3FN`	6	1	8	1
float8_e5m2	`F8E5M2`	7	1	8	1
complex32	`Complex32`	50	4	32	2
complex64	`Complex64`	51	8	64	4
complex128	`Complex128`	52	16	128	8
int8	`I8`	10	1	8	1
int16	`I16`	11	2	16	2
int32	`I32`	12	4	32	4
int64	`I64`	13	8	64	8
uint8	`U8`	20	1	8	1
uint16	`U16`	21	2	16	2
uint32	`U32`	22	4	32	4
uint64	`U64`	23	8	64	8
bool	`Bool`	30	1	8	1
quantized_i4	`QI4`	40	1	4	1
quantized_u8	`QU8`	41	1	8	1

Custom Data Types

use numina::{BFloat16, QuantizedU8, QuantizedI4};

// Brain Float 16
let bf16 = BFloat16::from_f32(3.14159);
assert_eq!(bf16.size_bytes(), 2);

// 8-bit quantized
let scale = 0.01;
let q8 = QuantizedU8::quantize(2.5, scale);
assert!((q8.dequantize(scale) - 2.5).abs() < 0.1);

// 4-bit quantized (2 values per byte)
let q4 = QuantizedI4::pack(3, -2);
assert_eq!(q4.size_bytes(), 1); // 87.5% memory savings!

Multiple Backends

use numina::{Array, CpuBytesArray, Shape, add, F32};

// Different backend implementations
let typed_array = Array::from_slice(&[1.0f32, 2.0], Shape::from([2]))?;
let bytes = [1.0f32, 2.0].iter().flat_map(|&x| x.to_le_bytes()).collect();
let byte_array = CpuBytesArray::new(bytes, Shape::from([2]), F32);

// Same operations work on all backends
let sum1 = add(&typed_array, &byte_array)?;
let sum2 = add(&byte_array, &typed_array)?;

// Cross-backend operations are fully supported
assert_eq!(sum1.shape(), sum2.shape());

Architecture

src/
├── array/           # NdArray trait and CPU implementations
├── dtype/           # Data type system and custom types
├── ops.rs           # Mathematical operations
├── reductions.rs    # Reduction operations
├── sorting.rs       # Sorting and searching
└── lib.rs           # Library interface

Status

Implemented:

Array operations (add, mul, matmul, reductions)
Multiple backends via NdArray trait (Array, CpuBytesArray)
Custom data types (BFloat16, QuantizedU8, QuantizedI4)
Shape manipulation (reshape, transpose)
Sorting and searching operations
49 tests passing

Planned:

Broadcasting, advanced indexing, linear algebra
File I/O, statistics
Memory-mapped arrays
More custom data types (FP8, FP4, NF4)

Integration

Numina serves as one of the core libraries for Laminax, enabling high-performance GPU/CPU computing.

numina 0.0.2