Expand description
§mlx-native
Pure-Rust Metal GPU compute library for MLX-compatible inference on Apple Silicon.
This crate provides a thin, safe wrapper around Apple’s Metal framework
focused on compute shader dispatch for neural network inference. It is
designed to be the GPU backend for the hf2q inference engine.
§Key Types
| Type | Purpose |
|---|---|
MlxDevice | Metal device + command queue (entry point) |
CommandEncoder | Batched compute command submission |
MlxBuffer | Typed Metal buffer with shape/dtype metadata |
MlxBufferPool | Arena allocator with power-of-two bucketing |
KernelRegistry | Lazy MSL compilation + pipeline cache |
DType | Element data type enum |
MlxError | Unified error type (never panics) |
§Quick Start
ⓘ
use mlx_native::{MlxDevice, DType};
let device = MlxDevice::new()?;
let buf = device.alloc_buffer(1024, DType::F32, vec![256])?;
let encoder = device.command_encoder()?;§Design Principles
- No panics — all public APIs return
Result<T, MlxError>. - Zero-copy —
StorageModeSharedbuffers on Apple Silicon unified memory. - Thread-safe —
MlxDeviceandMlxBufferareSend + Sync. - Lazy compilation — MSL shaders compiled on first use, then cached.
Re-exports§
pub use graph::ComputeGraph;pub use graph::GraphExecutor;pub use graph::GraphSession;pub use graph::OpKind;pub use gguf::GgufFile;pub use gguf::MetadataValue;pub use gguf::TensorInfo;pub use ops::quantized_matmul::quantized_matmul;pub use ops::quantized_matmul::quantized_matmul_simd;pub use ops::quantized_matmul::QuantizedMatmulParams;pub use ops::quantized_matmul_ggml::quantized_matmul_ggml;pub use ops::quantized_matmul_ggml::GgmlQuantizedMatmulParams;pub use ops::quantized_matmul_ggml::GgmlType;pub use ops::quantized_matmul_id::quantized_matmul_id;pub use ops::quantized_matmul_id::QuantizedMatmulIdParams;pub use ops::quantized_matmul_id_ggml::quantized_matmul_id_ggml;pub use ops::quantized_matmul_id_ggml::GgmlQuantizedMatmulIdParams;pub use weight::load_quantized_weights;pub use weight::safetensors_to_metal_buffer;pub use weight::QuantizationConfig;pub use weight::QuantizedWeight;pub use weight::SafetensorsFile;pub use weight::TensorQuantConfig;pub use metal;
Modules§
- gguf
- GGUF v3 file format parser.
- graph
GraphExecutor— batched Metal dispatch for single-encoder forward passes.- ops
- GPU kernel host-side dispatch functions.
- turboquant
- TurboQuant KV cache compression — CPU reference implementation.
- weight
- Weight loading from safetensors files into Metal GPU buffers.
Structs§
- Command
Encoder - A batched compute command encoder.
- Kernel
Registry - Registry that lazily compiles and caches Metal compute pipelines from embedded MSL source.
- MTLSize
- See https://developer.apple.com/documentation/metal/mtlsize
- MlxBuffer
- A Metal GPU buffer annotated with element dtype and tensor shape.
- MlxBuffer
Pool - Arena-style buffer pool that reuses Metal buffer allocations.
- MlxDevice
- Wraps a Metal device and its command queue.
Enums§
- Captured
Node - A single captured compute dispatch or barrier sentinel.
- DType
- Element data type carried by an
MlxBuffer. - Dispatch
Kind - How to dispatch the recorded kernel.
- MlxError
- Unified error type for all Metal GPU operations.
- Recorded
Binding - A recorded kernel argument binding.
Functions§
- dispatch_
count - Read the current value of
DISPATCH_COUNT. - reset_
counters - Reset both
SYNC_COUNTandDISPATCH_COUNTto zero. - sync_
count - Read the current value of
SYNC_COUNT.
Type Aliases§
- Result
- Convenience alias used throughout the crate.