Expand description
§RLX
A small ML compiler + runtime for transformer inference and training,
with a JAX-shaped IR + autodiff + transforms (jvp, hvp, vmap)
on top of CPU / Apple Silicon (Metal / MLX) / NVIDIA (CUDA) / AMD
(ROCm) / Google TPU / cross-platform GPU (wgpu) / FPGA / Cortex-M
backends.
This is the prelude crate — pulls in the framework-level
workspace members and re-exports the common types so a one-line
use rlx::prelude::*; covers most usage.
§Three usage patterns
§1. Build + run a graph by hand
use rlx::prelude::*;
let mut g = Graph::new("hello");
let x = g.input("x", Shape::new(&[1, 4], DType::F32));
let w = g.param("w", Shape::new(&[4, 2], DType::F32));
let y = g.matmul(x, w, Shape::new(&[1, 2], DType::F32));
g.set_outputs(vec![y]);
let mut compiled = Session::new(Device::Cpu).compile(g);
compiled.set_param("w", &[1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0]);
let out = compiled.run(&[("x", &[1.0, 2.0, 3.0, 4.0])]);§Module map
Every workspace crate is reachable as a module on rlx:
| path | crate | what |
|---|---|---|
rlx::ir | rlx-ir | IR types, ops, graph builder |
rlx::opt | rlx-opt | facade: rlx-fusion + rlx-autodiff + rlx-compile |
rlx::driver | rlx-driver | Device enum, registries |
rlx::runtime | rlx-runtime | Session, CompiledGraph |
rlx::macros | rlx-macros | #[rlx_model] proc macro |
rlx::gguf | rlx-gguf | GGUF parser + dequant (feature gguf) |
rlx::bench | rlx-bench | benchmark harness (feature bench) |
rlx::sparse | rlx-sparse | downstream: sparse linalg (feature sparse) |
rlx::splat | rlx-splat | 3D Gaussian splatting (feature splat) — register(), decomposed IR ops |
rlx::linalg | rlx-linalg | downstream: dense linalg via LAPACK (feature linalg) |
rlx::cortexm | rlx-cortexm | INT8 ARMv7E-M kernels (feature cortexm) — no Backend impl, kernels only |
rlx::fpga | rlx-fpga | IR → SystemVerilog datapath synthesis (feature fpga) — no Backend impl |
§Convenience namespaces
Grouped re-exports for related concerns — use these when you want one focused subset without star-importing the whole prelude:
| namespace | what |
|---|---|
[rlx::quant] | QuantScheme, QuantMap (IR quantization metadata) |
[rlx::ops] | Activation, BinaryOp, CmpOp, MaskKind, ChainStep, ChainOperand |
[rlx::autodiff] | jvp, hvp, vmap + the autodiff entry points |
[rlx::prelude] | star-import target covering the 95% case |
§Backend feature gates
Pick the ones that match your hardware. Multiple backends can be
enabled at once; the runtime picks one per Session.
| feature | backend | platform |
|---|---|---|
cpu (default) | NEON / AVX + Accelerate / OpenBLAS | every host |
metal | Metal Performance Shaders + MSL | macOS (Apple Silicon) |
mlx | Apple MLX (vendored) | macOS (Apple Silicon) |
gpu | wgpu (Vulkan / DX12 / WebGPU / Metal) | cross-platform |
cuda | cuBLAS / cuDNN / NVRTC | Linux / Windows + NVIDIA |
rocm | hipBLAS / MIOpen | Linux + AMD |
tpu | libtpu PJRT plugin | Linux + GCP TPU |
blas-accelerate | macOS Accelerate | macOS |
blas-mkl | Intel MKL | Intel / AMD CPUs |
blas-openblas | OpenBLAS | cross-platform CPU |
§Convenience aggregates
Single-flag setups for common platforms. Each composes the fragments most users want for that target.
| feature | expands to |
|---|---|
apple-silicon | cpu + metal + blas-accelerate |
nvidia | cpu + cuda |
edge | cpu + cortexm |
all-cpu | cpu + gguf + linalg |
mlx and rocm aren’t in any aggregate because their crates
aren’t on crates.io (vendor-bundled submodule / workspace-
relative kernel sources). To opt in, depend on the workspace via
git and add the feature explicitly:
rlx = { git = "https://github.com/MIT-RLX/rlx", features = ["apple-silicon", "mlx"] }Re-exports§
pub use rlx_ir as ir;pub use rlx_opt as opt;pub use rlx_driver as driver;pub use rlx_runtime as runtime;pub use rlx_macros as macros;pub use rlx_gguf as gguf;pub use rlx_bench as bench;pub use rlx_sparse as sparse;pub use rlx_linalg as linalg;pub use rlx_cortexm as cortexm;pub use rlx_fpga as fpga;
Modules§
- autodiff
- Autodiff + transforms — re-exports the public entry points from
rlx_opt. Use these when computing gradients or doingvmap/jvp/hvpover a graph. - ops
- Op-builder helper enums — the variants the graph builder methods
(
g.binary,g.compare,g.activation,g.attention_kind, …) take as their first argument, plus the fused-chain primitives used byOp::ElementwiseRegion. - prelude
- Star-import target covering the 95% case:
- quant
- Quantization metadata — schemes the IR carries per-tensor, plus
the
QuantMapgraph-level annotation. Use these when wiringOp::DequantMatMulor attaching quant info to your own ops. - vmap
Structs§
- Compile
Pipeline - End-to-end compiler pipeline configuration.
- Compile
Result - End-to-end compiler output: optimized LIR + fusion diagnostics.
- Compiled
Graph - A compiled graph ready for execution.
- Element
- Per-element semantics that don’t fit into a flat
DTypeenum (plan #40). Mirrors MAX’slayout/element.mojoElementtype:DTypesays “f8”, but two FP8 variants exist (e4m3 and e5m2) with different range/precision tradeoffs. Saturation policy (clamp on overflow vs. wrap) is similarly orthogonal. - Fusion
Options - Per-target fusion toggles (env-driven on Metal today).
- Fusion
Report - Before/after fusion statistics and missed-pattern tally.
- Graph
- A computation graph — the core IR data structure.
- Graph
Module - Unified model module — primary builder surface above HIR/MIR/LIR.
- HirModule
- High-level module — model builder output.
- LirModule
- Low-level module — backend compile input after optimization.
- MirModule
- Mid-level module — optimizer input.
- Missed
Fusion - A single fusion opportunity that remains in the graph.
- Node
- A single node in the computation graph.
- NodeId
- Stable identifier for a node in the graph. Indices are never reused.
- Node
Origin - Where a MIR node came from and how it was produced.
- Pipeline
Inspect - Text dump of each compiler pipeline stage.
- Session
- A session manages graph compilation and execution on a device.
- Shape
- Tensor shape: ordered list of dimensions + element type.
- Tick
- Opaque tick reading. Subtract two of these to get a
Duration.
Enums§
- DType
- Scalar element type. Matches hardware-supported types.
- Device
- Target device for graph execution.
- Fusion
Policy - How HIR block ops lower to MIR.
- Fusion
Target - Compile target that selects a fusion pipeline.
- Graph
Stage - Which stage of the HIR → MIR → LIR pipeline a
GraphModuleholds. - HirOp
- High-level operation — blocks and escape hatches.
- Miss
Reason - Why a recognizable fusion pattern was not collapsed.
- Op
- An operation in the RLX IR graph.
- OpKind
- PLAN L4: discriminant for each
Opvariant. Used byOp::kind+ theBackend::supported_opstrait method to declare which ops a backend can lower; theLegalizeForBackendpass inrlx-optchecks the graph against this set and fails the compile when an unsupported op is present (instead of silent fallback). - Precision
- Which numeric precision to use for an op. (Subset of DType — only the ones we currently dispatch on.)
- Precision
Policy - Declarative precision policy for graph compilation.
- Quant
Scheme - How a tensor is quantized. Mirrors the schemes RLX needs for LLM inference on Apple Silicon: blockwise int8 (GPTQ-style), blockwise int4 (Q4_K), and per-tensor fp8 (e4m3 / e5m2).
Traits§
- Pass
- A graph-to-graph transformation pass.
Functions§
- fusion_
passes - Return the ordered fusion passes for
target. - fusion_
passes_ for_ supported - Return the ordered fusion passes allowed for
supported. - hvp
- Hessian-vector product via forward-over-reverse.
- inspect_
graph - Annotated graph dump (MIR body). Alias for
pretty_print. - inspect_
graph_ diff - Summarize graph changes between pipeline stages.
- inspect_
hir - Annotated HIR module dump.
- inspect_
hir_ stats - One-line HIR summary (header + op histogram).
- inspect_
lir - Annotated LIR dump: optimized MIR + buffer plan + schedule.
- inspect_
mir - Annotated MIR module dump (optimized tensor DAG).
- inspect_
mir_ diff - Diff two MIR snapshots (typically pre/post fusion).
- inspect_
mir_ stats - One-line MIR summary.
- inspect_
pipeline - Inspect every lowering stage for
hirthroughpipeline. - jvp
- Compute the JVP graph for
forward, perturbing eachInput/Paramnamed intangent_for. Returns a new graph whose outputs are[primals..., tangents...], in the order forward listed them. - maybe_
dump_ pipeline - Write a full pipeline dump when
RLX_IR_DUMPis set (path prefix or directory). - node_
label - Best-effort label for diagnostics (origin label, node name, or id).
- supported_
for_ target - Per-target op claims used when a backend doesn’t supply an explicit
supported_opsslice. Must stay aligned with each backend’s*_SUPPORTED_OPSinrlx-runtime/src/backend.rs. - supports_
op - True when
supportedis empty (no claim) or containskind. - vmap
- Vectorize
forwardover a leading batch axis.
Type Aliases§
- Calibration
Record - Map of tap NodeId → calibrated quant params.
- Error
- Crate-wide error type — alias of
anyhow::Error. - Result
- Crate-wide result type — alias of
anyhow::Result<T>. Use this inmain()and library boundaries.