Crate mlx_native

Expand description

§mlx-native

Pure-Rust Metal GPU compute library for MLX-compatible inference on Apple Silicon.

This crate provides a thin, safe wrapper around Apple’s Metal framework focused on compute shader dispatch for neural network inference. It is designed to be the GPU backend for the hf2q inference engine.

§Key Types

Type	Purpose
`MlxDevice`	Metal device + command queue (entry point)
`CommandEncoder`	Batched compute command submission
`MlxBuffer`	Typed Metal buffer with shape/dtype metadata
`MlxBufferPool`	Arena allocator with power-of-two bucketing
`KernelRegistry`	Lazy MSL compilation + pipeline cache
`DType`	Element data type enum
`MlxError`	Unified error type (never panics)

§Quick Start

use mlx_native::{MlxDevice, DType};

let device = MlxDevice::new()?;
let buf = device.alloc_buffer(1024, DType::F32, vec![256])?;
let encoder = device.command_encoder()?;

§Design Principles

No panics — all public APIs return Result<T, MlxError>.
Zero-copy — StorageModeShared buffers on Apple Silicon unified memory.
Thread-safe — MlxDevice and MlxBuffer are Send + Sync.
Lazy compilation — MSL shaders compiled on first use, then cached.

Re-exports§

pub use graph::ComputeGraph;
pub use graph::GraphExecutor;
pub use graph::GraphSession;
pub use graph::OpKind;
pub use gguf::GgufFile;
pub use gguf::MetadataValue;
pub use gguf::TensorInfo;
pub use ops::quantized_matmul::quantized_matmul;
pub use ops::quantized_matmul::quantized_matmul_simd;
pub use ops::quantized_matmul::QuantizedMatmulParams;
pub use ops::quantized_matmul_ggml::quantized_matmul_ggml;
pub use ops::quantized_matmul_ggml::GgmlQuantizedMatmulParams;
pub use ops::quantized_matmul_ggml::GgmlType;
pub use ops::quantized_matmul_id::quantized_matmul_id;
pub use ops::quantized_matmul_id::QuantizedMatmulIdParams;
pub use ops::quantized_matmul_id_ggml::quantized_matmul_id_ggml;
pub use ops::quantized_matmul_id_ggml::GgmlQuantizedMatmulIdParams;
pub use weight::load_quantized_weights;
pub use weight::safetensors_to_metal_buffer;
pub use weight::QuantizationConfig;
pub use weight::QuantizedWeight;
pub use weight::SafetensorsFile;
pub use weight::TensorQuantConfig;
pub use metal;

Modules§

gguf: GGUF v3 file format parser.
graph: GraphExecutor — batched Metal dispatch for single-encoder forward passes.
ops: GPU kernel host-side dispatch functions.
turboquant: TurboQuant KV cache compression — CPU reference implementation.
weight: Weight loading from safetensors files into Metal GPU buffers.

Structs§

CommandEncoder: A batched compute command encoder.
KernelRegistry: Registry that lazily compiles and caches Metal compute pipelines from embedded MSL source.
MTLSize: See https://developer.apple.com/documentation/metal/mtlsize
MlxBuffer: A Metal GPU buffer annotated with element dtype and tensor shape.
MlxBufferPool: Arena-style buffer pool that reuses Metal buffer allocations.
MlxDevice: Wraps a Metal device and its command queue.

Enums§

CapturedNode: A single captured compute dispatch or barrier sentinel.
DType: Element data type carried by an MlxBuffer.
DispatchKind: How to dispatch the recorded kernel.
MlxError: Unified error type for all Metal GPU operations.
RecordedBinding: A recorded kernel argument binding.

Functions§

dispatch_count: Read the current value of DISPATCH_COUNT.
reset_counters: Reset both SYNC_COUNT and DISPATCH_COUNT to zero.
sync_count: Read the current value of SYNC_COUNT.

Type Aliases§

Result: Convenience alias used throughout the crate.

Crate mlx_native

Crate mlx_native Copy item path

§mlx-native

§Key Types

§Quick Start

§Design Principles

Re-exports§

Modules§

Structs§

Enums§

Functions§

Type Aliases§

Crate mlx_native