mlx-native

Pure-Rust Metal GPU compute library for MLX-compatible inference on Apple Silicon.

What is this?

mlx-native provides a thin, safe wrapper around Apple's Metal framework focused on compute shader dispatch for neural network inference. It handles buffer management, shader compilation, and GPU command encoding so you can focus on model logic.

Apple Silicon only — requires a Mac with an M-series chip (or Intel Mac with discrete AMD GPU, though optimized for Apple Silicon unified memory).

Features

No panics — all public APIs return Result<T, MlxError>
Zero-copy — StorageModeShared buffers on unified memory
Thread-safe — MlxDevice and MlxBuffer are Send + Sync
Lazy compilation — MSL shaders compiled on first use, then cached
Buffer pooling — arena allocator with power-of-two bucketing for reuse
Compute graph — record and replay GPU dispatch sequences

Quick start

use mlx_native::{MlxDevice, DType};

let device = MlxDevice::new()?;
let buf = device.alloc_buffer(1024, DType::F32, vec![256])?;
let encoder = device.command_encoder()?;

Key types

Type	Purpose
`MlxDevice`	Metal device + command queue (entry point)
`CommandEncoder`	Batched compute command submission
`MlxBuffer`	Typed Metal buffer with shape/dtype metadata
`MlxBufferPool`	Arena allocator with power-of-two bucketing
`KernelRegistry`	Lazy MSL compilation + pipeline cache
`ComputeGraph`	Recorded GPU dispatch sequence for replay
`DType`	Element data type enum
`MlxError`	Unified error type

GPU operations

mlx-native includes optimized Metal compute shaders for:

Quantized matrix multiplication (MLX and GGML formats)
Flash attention (scalar and TurboQuant variants, with sliding window)
Fused RMSNorm + residual add
Fused head norm + RoPE
RoPE positional encoding
Softmax and softcap
GeLU activation
Embedding lookup
Mixture-of-experts gating and dispatch
Hadamard transform (standalone and quantize-KV)
Gather, transpose, copy, argmax, argsort

Weight loading

Load safetensors and GGUF model files directly into Metal buffers:

use mlx_native::{MlxDevice, SafetensorsFile};

let device = MlxDevice::new()?;
let st = SafetensorsFile::open("model.safetensors")?;
let buffer = st.load_tensor(&device, "model.layers.0.self_attn.q_proj.weight")?;

Third-party licenses

This crate includes code derived from:

candle (Apache-2.0) — see LICENSE-APACHE-candle
llama.cpp (MIT) — see LICENSE-MIT-llamacpp

License

MIT — see LICENSE.

mlx-native 0.2.0