numina 0.0.2 - Docs.rs

# Numina - Backend-Agnostic Array Library for Rust

A safe, efficient array library with ndarray-compatible operations, designed as the foundation for high-performance computing backends in Rust.

## Features

- **Safe & Ergonomic**: Memory-safe array operations with Rust's guarantees
- **Type Safe**: Runtime shape and data type validation
- **Backend Agnostic**: `NdArray` trait enables multiple backends (CPU, GPU, remote)
- **Extensible Types**: Support for custom data types (BFloat16, quantized types)
- **Zero Dependencies**: Pure Rust implementation

## Quick Start

```rust
use numina::{Array, Shape, add, matmul, sum, F32};

// Create arrays
let a = Array::from_slice(&[1.0f32, 2.0, 3.0, 4.0], Shape::from([2, 2]))?;
let b = Array::from_slice(&[5.0f32, 6.0, 7.0, 8.0], Shape::from([2, 2]))?;

// Operations work on any NdArray backend
let c = add(&a, &b)?;           // Element-wise addition
let d = matmul(&a, &b)?;        // Matrix multiplication
let total = sum(&a, None)?;     // Sum all elements
let row_sums = sum(&a, Some(1))?; // Sum along axis
```

## Core Types

- **`Array<T>`**: Typed N-dimensional arrays for CPU operations
- **`CpuBytesArray`**: Byte-based N-dimensional arrays for CPU operations
- **`NdArray`**: Backend-agnostic trait for all array operations
- **`Shape`**: Multi-dimensional array dimensions
- **`DType`**: Data types (f32, f64, i8-i64, u8-u64, bool, custom types)

**Design Philosophy**: Numina provides the low-level backend infrastructure. High-level tensor APIs (like `Tensor` types) are provided by dependent crates (for example, `laminax-types`) which build upon Numina's `NdArray` trait.

## DType ID Table

Stable dtype IDs are serialized in Lamina IR and Laminax runtime. IDs are explicit and frozen.

FP8 formats follow the E4M3FN (finite-only, max 448) and E5M2 (Inf/NaN, max 57344) conventions with round-to-nearest-even (RTNE) rounding.

| Name | DType | ID | Bytes | Storage Bits | Align |
| --- | --- | --- | --- | --- | --- |
| float16 | `F16` | 1 | 2 | 16 | 2 |
| float32 | `F32` | 2 | 4 | 32 | 4 |
| float64 | `F64` | 3 | 8 | 64 | 8 |
| bfloat16 | `BF16` | 4 | 2 | 16 | 2 |
| bfloat8 | `BF8` | 5 | 1 | 8 | 1 |
| float8_e4m3fn | `F8E4M3FN` | 6 | 1 | 8 | 1 |
| float8_e5m2 | `F8E5M2` | 7 | 1 | 8 | 1 |
| complex32 | `Complex32` | 50 | 4 | 32 | 2 |
| complex64 | `Complex64` | 51 | 8 | 64 | 4 |
| complex128 | `Complex128` | 52 | 16 | 128 | 8 |
| int8 | `I8` | 10 | 1 | 8 | 1 |
| int16 | `I16` | 11 | 2 | 16 | 2 |
| int32 | `I32` | 12 | 4 | 32 | 4 |
| int64 | `I64` | 13 | 8 | 64 | 8 |
| uint8 | `U8` | 20 | 1 | 8 | 1 |
| uint16 | `U16` | 21 | 2 | 16 | 2 |
| uint32 | `U32` | 22 | 4 | 32 | 4 |
| uint64 | `U64` | 23 | 8 | 64 | 8 |
| bool | `Bool` | 30 | 1 | 8 | 1 |
| quantized_i4 | `QI4` | 40 | 1 | 4 | 1 |
| quantized_u8 | `QU8` | 41 | 1 | 8 | 1 |

## Custom Data Types

```rust
use numina::{BFloat16, QuantizedU8, QuantizedI4};

// Brain Float 16
let bf16 = BFloat16::from_f32(3.14159);
assert_eq!(bf16.size_bytes(), 2);

// 8-bit quantized
let scale = 0.01;
let q8 = QuantizedU8::quantize(2.5, scale);
assert!((q8.dequantize(scale) - 2.5).abs() < 0.1);

// 4-bit quantized (2 values per byte)
let q4 = QuantizedI4::pack(3, -2);
assert_eq!(q4.size_bytes(), 1); // 87.5% memory savings!
```

## Multiple Backends

```rust
use numina::{Array, CpuBytesArray, Shape, add, F32};

// Different backend implementations
let typed_array = Array::from_slice(&[1.0f32, 2.0], Shape::from([2]))?;
let bytes = [1.0f32, 2.0].iter().flat_map(|&x| x.to_le_bytes()).collect();
let byte_array = CpuBytesArray::new(bytes, Shape::from([2]), F32);

// Same operations work on all backends
let sum1 = add(&typed_array, &byte_array)?;
let sum2 = add(&byte_array, &typed_array)?;

// Cross-backend operations are fully supported
assert_eq!(sum1.shape(), sum2.shape());
```

## Architecture

```
src/
├── array/           # NdArray trait and CPU implementations
├── dtype/           # Data type system and custom types
├── ops.rs           # Mathematical operations
├── reductions.rs    # Reduction operations
├── sorting.rs       # Sorting and searching
└── lib.rs           # Library interface
```

## Status

**Implemented:**
- Array operations (add, mul, matmul, reductions)
- Multiple backends via NdArray trait (Array<T>, CpuBytesArray)
- Custom data types (BFloat16, QuantizedU8, QuantizedI4)
- Shape manipulation (reshape, transpose)
- Sorting and searching operations
- 49 tests passing

**Planned:**
- Broadcasting, advanced indexing, linear algebra
- File I/O, statistics
- Memory-mapped arrays
- More custom data types (FP8, FP4, NF4)

## Integration

Numina serves as one of the core libraries for [Laminax](https://github.com/SkuldNorniern/laminax), enabling high-performance GPU/CPU computing.