Skip to main content

Crate mlx_native

Crate mlx_native 

Source
Expand description

§mlx-native

Pure-Rust Metal GPU compute library for MLX-compatible inference on Apple Silicon.

This crate provides a thin, safe wrapper around Apple’s Metal framework focused on compute shader dispatch for neural network inference. It is designed to be the GPU backend for the hf2q inference engine.

§Key Types

TypePurpose
MlxDeviceMetal device + command queue (entry point)
CommandEncoderBatched compute command submission
MlxBufferTyped Metal buffer with shape/dtype metadata
MlxBufferPoolArena allocator with power-of-two bucketing
KernelRegistryLazy MSL compilation + pipeline cache
DTypeElement data type enum
MlxErrorUnified error type (never panics)

§Quick Start

use mlx_native::{MlxDevice, DType};

let device = MlxDevice::new()?;
let buf = device.alloc_buffer(1024, DType::F32, vec![256])?;
let encoder = device.command_encoder()?;

§Design Principles

  • No panics — all public APIs return Result<T, MlxError>.
  • Zero-copyStorageModeShared buffers on Apple Silicon unified memory.
  • Thread-safeMlxDevice and MlxBuffer are Send + Sync.
  • Lazy compilation — MSL shaders compiled on first use, then cached.

Re-exports§

pub use graph::ComputeGraph;
pub use graph::GraphExecutor;
pub use graph::GraphSession;
pub use graph::OpKind;
pub use gguf::GgufFile;
pub use gguf::MetadataValue;
pub use gguf::TensorInfo;
pub use ops::quantized_matmul::quantized_matmul;
pub use ops::quantized_matmul::quantized_matmul_simd;
pub use ops::quantized_matmul::QuantizedMatmulParams;
pub use ops::quantized_matmul_ggml::quantized_matmul_ggml;
pub use ops::quantized_matmul_ggml::GgmlQuantizedMatmulParams;
pub use ops::quantized_matmul_ggml::GgmlType;
pub use ops::quantized_matmul_id::quantized_matmul_id;
pub use ops::quantized_matmul_id::QuantizedMatmulIdParams;
pub use ops::quantized_matmul_id_ggml::quantized_matmul_id_ggml;
pub use ops::quantized_matmul_id_ggml::GgmlQuantizedMatmulIdParams;
pub use weight::load_quantized_weights;
pub use weight::safetensors_to_metal_buffer;
pub use weight::QuantizationConfig;
pub use weight::QuantizedWeight;
pub use weight::SafetensorsFile;
pub use weight::TensorQuantConfig;
pub use metal;

Modules§

gguf
GGUF v3 file format parser.
graph
GraphExecutor — batched Metal dispatch for single-encoder forward passes.
ops
GPU kernel host-side dispatch functions.
turboquant
TurboQuant KV cache compression — CPU reference implementation.
weight
Weight loading from safetensors files into Metal GPU buffers.

Structs§

CommandEncoder
A batched compute command encoder.
KernelRegistry
Registry that lazily compiles and caches Metal compute pipelines from embedded MSL source.
MTLSize
See https://developer.apple.com/documentation/metal/mtlsize
MlxBuffer
A Metal GPU buffer annotated with element dtype and tensor shape.
MlxBufferPool
Arena-style buffer pool that reuses Metal buffer allocations.
MlxDevice
Wraps a Metal device and its command queue.

Enums§

CapturedNode
A single captured compute dispatch or barrier sentinel.
DType
Element data type carried by an MlxBuffer.
DispatchKind
How to dispatch the recorded kernel.
MlxError
Unified error type for all Metal GPU operations.
RecordedBinding
A recorded kernel argument binding.

Functions§

dispatch_count
Read the current value of DISPATCH_COUNT.
reset_counters
Reset both SYNC_COUNT and DISPATCH_COUNT to zero.
sync_count
Read the current value of SYNC_COUNT.

Type Aliases§

Result
Convenience alias used throughout the crate.