Expand description
§mlx-native
Pure-Rust Metal GPU compute library for MLX-compatible inference on Apple Silicon.
This crate provides a thin, safe wrapper around Apple’s Metal framework
focused on compute shader dispatch for neural network inference. It is
designed to be the GPU backend for the hf2q inference engine.
§Key Types
| Type | Purpose |
|---|---|
MlxDevice | Metal device + command queue (entry point) |
CommandEncoder | Batched compute command submission |
MlxBuffer | Typed Metal buffer with shape/dtype metadata |
MlxBufferPool | Arena allocator with power-of-two bucketing |
KernelRegistry | Lazy MSL compilation + pipeline cache |
DType | Element data type enum |
MlxError | Unified error type (never panics) |
§Quick Start
ⓘ
use mlx_native::{MlxDevice, DType};
let device = MlxDevice::new()?;
let buf = device.alloc_buffer(1024, DType::F32, vec![256])?;
let encoder = device.command_encoder()?;§Design Principles
- No panics — all public APIs return
Result<T, MlxError>. - Zero-copy —
StorageModeSharedbuffers on Apple Silicon unified memory. - Thread-safe —
MlxDeviceandMlxBufferareSend + Sync. - Lazy compilation — MSL shaders compiled on first use, then cached.
Re-exports§
pub use graph::ComputeGraph;pub use graph::GraphExecutor;pub use graph::GraphSession;pub use graph::OpKind;pub use gguf::GgufFile;pub use gguf::MetadataValue;pub use gguf::TensorInfo;pub use ops::dense_mm_bf16::dense_matmul_bf16_f32_tensor;pub use ops::dense_mm_bf16::DenseMmBf16F32Params;pub use ops::dense_mm_f16::dense_matmul_f16_f32_tensor;pub use ops::dense_mm_f16::DenseMmF16F32Params;pub use ops::dense_mm_f32_f32::dense_matmul_f32_f32_tensor;pub use ops::dense_mm_f32_f32::DenseMmF32F32Params;pub use ops::quantized_matmul::quantized_matmul;pub use ops::quantized_matmul::quantized_matmul_simd;pub use ops::quantized_matmul::QuantizedMatmulParams;pub use ops::quantized_matmul_ggml::quantized_matmul_ggml;pub use ops::quantized_matmul_ggml::quantized_matmul_mm_tensor_perm021;pub use ops::quantized_matmul_ggml::quantized_matmul_mm_tensor_perm021_f16;pub use ops::quantized_matmul_ggml::GgmlQuantizedMatmulParams;pub use ops::quantized_matmul_ggml::GgmlQuantizedMatmulPerm021Params;pub use ops::quantized_matmul_ggml::GgmlType;pub use ops::quantized_matmul_ggml::MM_ROUTING_THRESHOLD;pub use ops::mul_mv_ext::mul_mv_ext_dispatch;pub use ops::mul_mv_ext::MulMvExtParams;pub use ops::quantized_matmul_id::quantized_matmul_id;pub use ops::quantized_matmul_id::quantized_matmul_id_into;pub use ops::quantized_matmul_id::QuantizedMatmulIdParams;pub use ops::quantized_matmul_id_ggml::quantized_matmul_id_ggml;pub use ops::quantized_matmul_id_ggml::quantized_matmul_id_ggml_pooled;pub use ops::quantized_matmul_id_ggml::quantized_matmul_id_swiglu_q4_0;pub use ops::quantized_matmul_id_ggml::GgmlIdMmDispatchParams;pub use ops::quantized_matmul_id_ggml::GgmlQuantizedMatmulIdParams;pub use ops::quantized_matmul_id_ggml::IdMmScratch;pub use ops::quantized_matmul_id_ggml::MM_ID_ROUTING_THRESHOLD;pub use weight::load_quantized_weights;pub use weight::safetensors_to_metal_buffer;pub use weight::QuantizationConfig;pub use weight::QuantizedWeight;pub use weight::SafetensorsFile;pub use weight::TensorQuantConfig;pub use metal;
Modules§
- encoder_
worker - Persistent encoder worker thread (ADR-028 iter-380).
- gguf
- GGUF v3 file format parser.
- graph
GraphExecutor— batched Metal dispatch for single-encoder forward passes.- kernel_
profile - Per-command-buffer + per-dispatch GPU timing accumulator for kernel-level profiling.
- metal_
capture - Programmatic Metal Frame Capture wrapping (ADR-015 iter63 Part B).
- ops
- GPU kernel host-side dispatch functions.
- tq_
oracle - ADR-007 Path C F-0.1: CPU F32 oracle for
flash_attn_vec_tq_hbdecode. - turboquant
- TurboQuant KV cache compression — CPU reference implementation.
- weight
- Weight loading from safetensors files into Metal GPU buffers.
Structs§
- Buffer
Range - A buffer region recorded for dataflow tracking.
- Command
Encoder - A batched compute command encoder.
- Dispatch
Record - Pre-baked dispatch record for hot decode paths.
- Encoder
Session - Session-level wrapper around a
CommandEncoderfor one or more logical transformer stages. - Kernel
Registry - Registry that lazily compiles and caches Metal compute pipelines from embedded MSL source.
- MTLSize
- See https://developer.apple.com/documentation/metal/mtlsize
- MemRanges
- Cumulative dataflow state for a sequence of concurrent dispatches.
- MlxBuffer
- A Metal GPU buffer annotated with element dtype and tensor shape.
- MlxBuffer
Pool - Arena-style buffer pool that reuses Metal buffer allocations.
- MlxDevice
- Wraps a Metal device and its command queue.
Enums§
- Captured
Node - A single captured compute dispatch or barrier sentinel.
- Captured
OpKind - Operation kind tag for captured nodes, used by the fusion pass (4e.2).
- DType
- Element data type carried by an
MlxBuffer. - Dispatch
Kind - How to dispatch the recorded kernel.
- Kernel
Arg - A buffer or inline-bytes binding for a compute kernel argument slot.
- MemRange
Role - Whether a recorded range was read by a dispatch (
Src) or written by a dispatch (Dst). Mirrorsggml_mem_range_typeinggml-metal-common.h:14-17. - MlxError
- Unified error type for all Metal GPU operations.
- Recorded
Binding - A recorded kernel argument binding.
Functions§
- auto_
barrier_ concurrent_ count - Read the cumulative number of
dispatch_trackedcalls that did NOT emit a barrier (ran concurrent with the previous group). - auto_
barrier_ count - Read the cumulative number of auto-emitted barriers across all
encoders since process start (or last
reset_counters). - barrier_
count - Read the current value of
BARRIER_COUNT. - barrier_
total_ ns - Read the total nanoseconds spent in the
memoryBarrierWithScope:objc::msg_send!site. Only non-zero whenMLX_PROFILE_BARRIERS=1was in the environment at the time of the firstmemory_barrier()call (the env check is cached on first use). - cmd_
buf_ count - Read the current value of
CMD_BUF_COUNT. - dispatch_
count - Read the current value of
DISPATCH_COUNT. - pipeline_
dispatch_ buckets - Public dump of
MLX_DISP_BUCKETdata:Vec<(label, count)>sorted descending by count. Returns empty when env-flag is off / never recorded. - reset_
counters - Reset all counters to zero.
- reset_
pipeline_ dispatch_ buckets - Reset the per-pipeline dispatch buckets (typically called at decode start to ignore prefill / warmup contributions).
- sync_
count - Read the current value of
SYNC_COUNT.
Type Aliases§
- Result
- Convenience alias used throughout the crate.