Skip to main content

Crate rlx

Crate rlx 

Source
Expand description

§RLX

A small ML compiler + runtime for transformer inference and training, with a JAX-shaped IR + autodiff + transforms (jvp, hvp, vmap) on top of CPU / Apple Silicon (Metal / MLX) / NVIDIA (CUDA) / AMD (ROCm) / Google TPU / cross-platform GPU (wgpu) / FPGA / Cortex-M backends.

This is the prelude crate — pulls in the framework-level workspace members and re-exports the common types so a one-line use rlx::prelude::*; covers most usage.

§Three usage patterns

§1. Build + run a graph by hand

use rlx::prelude::*;

let mut g = Graph::new("hello");
let x = g.input("x", Shape::new(&[1, 4], DType::F32));
let w = g.param("w", Shape::new(&[4, 2], DType::F32));
let y = g.matmul(x, w, Shape::new(&[1, 2], DType::F32));
g.set_outputs(vec![y]);

let mut compiled = Session::new(Device::Cpu).compile(g);
compiled.set_param("w", &[1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0]);
let out = compiled.run(&[("x", &[1.0, 2.0, 3.0, 4.0])]);

§Module map

Every workspace crate is reachable as a module on rlx:

pathcratewhat
rlx::irrlx-irIR types, ops, graph builder
rlx::optrlx-optfacade: rlx-fusion + rlx-autodiff + rlx-compile
rlx::driverrlx-driverDevice enum, registries
rlx::runtimerlx-runtimeSession, CompiledGraph
rlx::macrosrlx-macros#[rlx_model] proc macro
rlx::ggufrlx-ggufGGUF parser + dequant (feature gguf)
rlx::benchrlx-benchbenchmark harness (feature bench)
rlx::sparserlx-sparsedownstream: sparse linalg (feature sparse)
rlx::splatrlx-splat3D Gaussian splatting (feature splat)register(), decomposed IR ops
rlx::linalgrlx-linalgdownstream: dense linalg via LAPACK (feature linalg)
rlx::cortexmrlx-cortexmINT8 ARMv7E-M kernels (feature cortexm) — no Backend impl, kernels only
rlx::fpgarlx-fpgaIR → SystemVerilog datapath synthesis (feature fpga) — no Backend impl

§Convenience namespaces

Grouped re-exports for related concerns — use these when you want one focused subset without star-importing the whole prelude:

namespacewhat
[rlx::quant]QuantScheme, QuantMap (IR quantization metadata)
[rlx::ops]Activation, BinaryOp, CmpOp, MaskKind, ChainStep, ChainOperand
[rlx::autodiff]jvp, hvp, vmap + the autodiff entry points
[rlx::prelude]star-import target covering the 95% case

§Backend feature gates

Pick the ones that match your hardware. Multiple backends can be enabled at once; the runtime picks one per Session.

featurebackendplatform
cpu (default)NEON / AVX + Accelerate / OpenBLASevery host
metalMetal Performance Shaders + MSLmacOS (Apple Silicon)
mlxApple MLX (vendored)macOS (Apple Silicon)
gpuwgpu (Vulkan / DX12 / WebGPU / Metal)cross-platform
cudacuBLAS / cuDNN / NVRTCLinux / Windows + NVIDIA
rocmhipBLAS / MIOpenLinux + AMD
tpulibtpu PJRT pluginLinux + GCP TPU
blas-acceleratemacOS AcceleratemacOS
blas-mklIntel MKLIntel / AMD CPUs
blas-openblasOpenBLAScross-platform CPU

§Convenience aggregates

Single-flag setups for common platforms. Each composes the fragments most users want for that target.

featureexpands to
apple-siliconcpu + metal + blas-accelerate
nvidiacpu + cuda
edgecpu + cortexm
all-cpucpu + gguf + linalg

mlx and rocm aren’t in any aggregate because their crates aren’t on crates.io (vendor-bundled submodule / workspace- relative kernel sources). To opt in, depend on the workspace via git and add the feature explicitly:

rlx = { git = "https://github.com/MIT-RLX/rlx", features = ["apple-silicon", "mlx"] }

Re-exports§

pub use rlx_ir as ir;
pub use rlx_opt as opt;
pub use rlx_driver as driver;
pub use rlx_runtime as runtime;
pub use rlx_macros as macros;
pub use rlx_gguf as gguf;
pub use rlx_bench as bench;
pub use rlx_sparse as sparse;
pub use rlx_linalg as linalg;
pub use rlx_cortexm as cortexm;
pub use rlx_fpga as fpga;

Modules§

autodiff
Autodiff + transforms — re-exports the public entry points from rlx_opt. Use these when computing gradients or doing vmap / jvp / hvp over a graph.
ops
Op-builder helper enums — the variants the graph builder methods (g.binary, g.compare, g.activation, g.attention_kind, …) take as their first argument, plus the fused-chain primitives used by Op::ElementwiseRegion.
prelude
Star-import target covering the 95% case:
quant
Quantization metadata — schemes the IR carries per-tensor, plus the QuantMap graph-level annotation. Use these when wiring Op::DequantMatMul or attaching quant info to your own ops.
vmap

Structs§

CompilePipeline
End-to-end compiler pipeline configuration.
CompileResult
End-to-end compiler output: optimized LIR + fusion diagnostics.
CompiledGraph
A compiled graph ready for execution.
Element
Per-element semantics that don’t fit into a flat DType enum (plan #40). Mirrors MAX’s layout/element.mojo Element type: DType says “f8”, but two FP8 variants exist (e4m3 and e5m2) with different range/precision tradeoffs. Saturation policy (clamp on overflow vs. wrap) is similarly orthogonal.
FusionOptions
Per-target fusion toggles (env-driven on Metal today).
FusionReport
Before/after fusion statistics and missed-pattern tally.
Graph
A computation graph — the core IR data structure.
GraphModule
Unified model module — primary builder surface above HIR/MIR/LIR.
HirModule
High-level module — model builder output.
LirModule
Low-level module — backend compile input after optimization.
MirModule
Mid-level module — optimizer input.
MissedFusion
A single fusion opportunity that remains in the graph.
Node
A single node in the computation graph.
NodeId
Stable identifier for a node in the graph. Indices are never reused.
NodeOrigin
Where a MIR node came from and how it was produced.
PipelineInspect
Text dump of each compiler pipeline stage.
Session
A session manages graph compilation and execution on a device.
Shape
Tensor shape: ordered list of dimensions + element type.
Tick
Opaque tick reading. Subtract two of these to get a Duration.

Enums§

DType
Scalar element type. Matches hardware-supported types.
Device
Target device for graph execution.
FusionPolicy
How HIR block ops lower to MIR.
FusionTarget
Compile target that selects a fusion pipeline.
GraphStage
Which stage of the HIR → MIR → LIR pipeline a GraphModule holds.
HirOp
High-level operation — blocks and escape hatches.
MissReason
Why a recognizable fusion pattern was not collapsed.
Op
An operation in the RLX IR graph.
OpKind
PLAN L4: discriminant for each Op variant. Used by Op::kind + the Backend::supported_ops trait method to declare which ops a backend can lower; the LegalizeForBackend pass in rlx-opt checks the graph against this set and fails the compile when an unsupported op is present (instead of silent fallback).
Precision
Which numeric precision to use for an op. (Subset of DType — only the ones we currently dispatch on.)
PrecisionPolicy
Declarative precision policy for graph compilation.
QuantScheme
How a tensor is quantized. Mirrors the schemes RLX needs for LLM inference on Apple Silicon: blockwise int8 (GPTQ-style), blockwise int4 (Q4_K), and per-tensor fp8 (e4m3 / e5m2).

Traits§

Pass
A graph-to-graph transformation pass.

Functions§

fusion_passes
Return the ordered fusion passes for target.
fusion_passes_for_supported
Return the ordered fusion passes allowed for supported.
hvp
Hessian-vector product via forward-over-reverse.
inspect_graph
Annotated graph dump (MIR body). Alias for pretty_print.
inspect_graph_diff
Summarize graph changes between pipeline stages.
inspect_hir
Annotated HIR module dump.
inspect_hir_stats
One-line HIR summary (header + op histogram).
inspect_lir
Annotated LIR dump: optimized MIR + buffer plan + schedule.
inspect_mir
Annotated MIR module dump (optimized tensor DAG).
inspect_mir_diff
Diff two MIR snapshots (typically pre/post fusion).
inspect_mir_stats
One-line MIR summary.
inspect_pipeline
Inspect every lowering stage for hir through pipeline.
jvp
Compute the JVP graph for forward, perturbing each Input / Param named in tangent_for. Returns a new graph whose outputs are [primals..., tangents...], in the order forward listed them.
maybe_dump_pipeline
Write a full pipeline dump when RLX_IR_DUMP is set (path prefix or directory).
node_label
Best-effort label for diagnostics (origin label, node name, or id).
supported_for_target
Per-target op claims used when a backend doesn’t supply an explicit supported_ops slice. Must stay aligned with each backend’s *_SUPPORTED_OPS in rlx-runtime/src/backend.rs.
supports_op
True when supported is empty (no claim) or contains kind.
vmap
Vectorize forward over a leading batch axis.

Type Aliases§

CalibrationRecord
Map of tap NodeId → calibrated quant params.
Error
Crate-wide error type — alias of anyhow::Error.
Result
Crate-wide result type — alias of anyhow::Result<T>. Use this in main() and library boundaries.