rlx
A small ML compiler and runtime for transformer inference and training.
JAX-shaped IR + autodiff + transforms (jvp, hvp, vmap) on top of
backend-specific kernels for CPU, Apple Silicon (Metal / MLX), NVIDIA
(CUDA), AMD (ROCm), Google TPU, and cross-platform GPU (wgpu).
This is the prelude crate — pulls in rlx-ir / rlx-opt /
rlx-runtime and re-exports the common types. Most code only needs
one use rlx::prelude::*;.
Install
[]
= { = "0.2", = ["cpu"] }
For common platforms, single-flag aggregates compose the right fragments:
= { = "0.2", = ["apple-silicon"] } # cpu + metal + Accelerate
= { = "0.2", = ["nvidia"] } # cpu + cuda
= { = "0.2", = ["edge"] } # cpu + cortexm
= { = "0.2", = ["all-cpu"] } # cpu + gguf + linalg
mlxandrocmfeatures.rlx-mlxandrlx-rocmaren't on crates.io (vendor-bundled submodule / workspace-relative kernel sources). Enabling those features on a crates.io build will fail to resolve. Use a git source instead:= { = "https://github.com/MIT-RLX/rlx", = ["apple-silicon", "mlx"] }
Quickstart
use *;
let mut g = new;
let x = g.input;
let w = g.param;
let y = g.matmul;
g.set_outputs;
let mut compiled = new.compile;
compiled.set_param;
let out = compiled.run;
Prelude + namespaces
| import | gives you |
|---|---|
use rlx::prelude::*; |
Graph, Session, DType, Device, Result, Activation, BinaryOp, jvp, vmap, … |
use rlx::ops::*; |
IR helper enums: Activation, BinaryOp, CmpOp, MaskKind, ChainStep, ChainOperand |
use rlx::quant::*; |
QuantScheme, QuantMap |
use rlx::gguf::*; |
GGUF parser + dequant (gguf feature) |
use rlx::autodiff::*; |
jvp, hvp, vmap |
use rlx::ir::… |
full rlx-ir surface (everything the prelude doesn't lift) |
use rlx::runtime::… |
full rlx-runtime surface (backends, custom Session config) |
rlx::Result<T> and rlx::Error are aliases of anyhow::Result<T>
and anyhow::Error — the whole stack returns those.
Feature matrix
Backends
| feature | backend | platform |
|---|---|---|
cpu (default) |
NEON / AVX + Accelerate / OpenBLAS | every host |
metal |
Metal Performance Shaders + MSL | macOS (Apple Silicon) |
mlx |
Apple MLX (vendored) | macOS (Apple Silicon) |
gpu |
wgpu (Vulkan / DX12 / WebGPU / Metal) | cross-platform |
cuda |
cuBLAS / cuDNN / NVRTC | Linux / Windows + NVIDIA |
rocm |
hipBLAS / MIOpen | Linux + AMD |
tpu |
libtpu PJRT plugin | Linux + GCP TPU |
blas-accelerate |
macOS Accelerate | macOS |
blas-mkl |
Intel MKL | Intel / AMD CPUs |
blas-openblas |
OpenBLAS | cross-platform CPU |
Companion crates
Off by default; turn on per workload:
| feature | what |
|---|---|
gguf |
GGUF v1 / v2 / v3 parser + dequant → rlx::gguf |
bench |
uniform benchmark harness → rlx::bench |
sparse |
sparse linear algebra (custom-op scaffold) → rlx::sparse |
linalg |
dense linalg via LAPACK (custom-op scaffold) → rlx::linalg |
cortexm |
INT8 ARMv7E-M kernels → rlx::cortexm (no Backend impl) |
fpga |
IR → SystemVerilog datapath synthesis → rlx::fpga (no Backend) |
cortexm and fpga don't go through the Session / Backend
pipeline — they're specialty targets exposed for direct use.
Convenience aggregates
| feature | expands to |
|---|---|
apple-silicon |
cpu + metal + blas-accelerate |
nvidia |
cpu + cuda |
edge |
cpu + cortexm |
all-cpu |
cpu + gguf + linalg |
mlx and rocm aren't in any aggregate (vendor-bundled). To opt
in, add the feature explicitly to a git-source dep:
= { = "https://github.com/MIT-RLX/rlx", = ["apple-silicon", "mlx"] }
Documentation
- API reference: https://docs.rs/rlx
- Workspace overview + per-crate READMEs: https://github.com/MIT-RLX/rlx
License
GPL-3.0-only. See LICENSE.