nbml
A minimal machine learning library built on ndarray for low-level ML algorithm development in Rust.
Unlike high-level frameworks, nbml provides bare primitives and a lightweight optimizer API for building custom neural networks from scratch. If you want comfortable abstractions, see Burn. If you want to understand what's happening under the hood and have full control, nbml gives you the building blocks.
Features
- Core primitives: Attention, LSTM, RNN, Conv2D, Feedforward layers, etc
- Activation functions: ReLU, Sigmoid, Tanh, Softmax, etc
- Optimizers: AdamW, SGD
- Utilities: Variable Sequence Batching, Gradient Clipping, Gumbel Softmax, Plots, etc
- Minimal abstractions: Direct ndarray integration for custom algorithms
Quick Start
use FFN;
use Activation;
use AdamW;
use ToParams;
// Build a simple feedforward network
let mut model = FFNnew;
// Create optimizer
let mut optimizer = default.with;
// Training loop (simplified)
for batch in training_data
Architecture
NN Layers (nbml::nn)
Layer: Single nonlinear projection layerFFN: Feedforward network with configurable layersLSTM: Long Short-Term Memory NetworkRNN: Vanilla recurrent neural networkESN: Echo-state network, fixed recurrence + readoutLayerNorm: Layer normalizationPooling: Sequence mean-poolingConv2D: Explicit Im2Col Conv2D layer (CPU efficient, memory hungry)PatchwiseConv2D: Patchwise Conv2D layer (CPU hungry, memory efficient)LinearSSM: Discrete Linear SSMAttention: Core attention primitiveSelfAttention: Multi-head self attentionCrossAttention: Multi-head cross attentionTransformer: Transformer encoder/decoder blockGatedLinearAttention: Multi-head gated linear attention with matrix-valued state and outer-product gating (Yang et al., 2024)AttentionHead: Multi-head self-attention mechanism (dep, useSelfAttention)TransformerEncoder: Pre-norm transformer encoder (dep, useTransformer::new_encoder())TransformerDecoder: Pre-norm transformer decoder (dep, useTransformer::new_decoder())
Optimizers (nbml::optim)
Implement the ToParams trait for gradient-based optimization:
// impl Affine {}
You can bubble params up:
// impl AffineAffine {}
ToParams will also let you zero gradients:
let mut aa = new;
aa.forward // <- implement this yourself
aa.backward // <- implement this yourself
aa.zero_grads;
Available optimizers:
AdamW: Adaptive moment estimation with bias correctionSGD: Stochastic gradient descent with optional momentum
Use .with(&mut impl ToParams) to prepare a stateful optimizer (like AdamW) for your network:
let mut model = new;
let mut optim = default.with; // <- adamw creates momentums, values for all parameters in Model
Activation Functions (nbml::f)
use f;
let x = from_vec;
let activated = relu;
let softmax = softmax;
Includes derivatives for backpropagation: d_relu, d_tanh, d_sigmoid, etc.
Design Philosophy
nbml is designed for:
- Experimentation / Research: Prototyping of novel architectures, through full control of forward and backward passes
- Transparency: No hidden magic, every operation is explicit
- Compute-Constrained Deployment: Lightweight + no C deps. Very quick for small models.
nbml is not designed for:
- Large Scale Production deployment (use PyTorch, TensorFlow, or Burn)
- Automatic differentiation (you write the backward pass)
- GPU acceleration (CPU-only via ndarray)
- Plug-and-play models (you build everything yourself)
Examples
Custom LSTM Training
use LSTM;
use Adam;
let mut lstm = LSTMnew;
let mut optimizer = default.with;
// where batch.dim() is (batch_size, seq_len, features)
// and features == lstm.d_model == (128 in this case)
for batch in data
Multi-Head Attention
use SelfAttention;
let mut attention = new;
// where input.dim() is (batch_size, seq_len, features)
// features == d_in == (512 in this case)
// and mask == (batch_size, seq_len, seq_len)
// with each element as 1. or 0. depending on whether or not the token
// is padding
let output = attention.forward;
Transformer Decoder
use Transformer;
use Activation;
use Array3;
let mut transformer = new_decoder;
let y_pred = transformer.forward;
// some bs.
let d_y_pred = ones;
transformer.backward;
transformer.zero_grads;