Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Safe Rust bindings for Apple MLX — core arrays & ops, plus LM, VLM, audio, and embeddings
Overview
mlxrs provides safe, idiomatic Rust bindings to MLX —
Apple's array framework for machine learning on Apple silicon — through the
mlx-c FFI layer. As in MLX, operations build a
lazy computation graph and run only when a result is read (to_vec, item) or
eval is called, preserving the unified-memory CPU/GPU execution model.
On top of the core Array / Dtype / ops surface, optional features port the
higher-level support of MLX's companion projects — mlx-lm, mlx-vlm, mlx-audio,
and mlx-embeddings (loaders, tokenizers, KV-caches, samplers, quantization,
generation loops, audio DSP, pooling).
Installation
Core only:
[]
= "0.1"
Or enable one of the higher-level feature sets:
# language models
[]
= { = "0.1", = ["lm"] }
# vision-language models (implies lm)
[]
= { = "0.1", = ["vlm"] }
# audio (implies lm)
[]
= { = "0.1", = ["audio"] }
# embedding utilities
[]
= { = "0.1", = ["embeddings"] }
Quick start
use Array;
The fallible a.add(&b)? form is always available; operator overloads
(&a + &b) are opt-in behind unstable-ops-overload (see Caveats).
Features
Higher-level surfaces are off by default — enable what you need:
lm— language models: tokenizers (BPE / SPM / chat templates / tool parsing), KV-caches, samplers + logits processors, quantization, LoRA/DoRA, optimizers, and the generation loop.vlm— vision-language models (implieslm): image preprocessing, prompt assembly, multimodal generation.audio— audio (implieslm): STFT/mel DSP, WAV I/O, STT/TTS, playback.embeddings— embedding-model loading, pooling, and the encode pipeline.
Finer-grained flags (individual tokenizer-*, gguf, llguidance,
unstable-ops-overload) are listed in mlxrs/Cargo.toml and
the API docs.
Platform support
mlxrs targets aarch64-apple-darwin (Apple silicon). Other platforms
(x86_64-apple-darwin, Linux + CUDA, distributed) are roadmapped.
Caveats
Arrayis!Sendand!Sync— single-thread use only. The underlying C++array_descis shared byArray::try_clone(refcount-bumped) and mutates non-atomic state internally, so cross-thread sharing is unsound without external synchronization.Arraydoes not implementClone; the only duplication is the fallibletry_clone. There is no shared-array wrapper — to use array data on another thread, extract owned data viato_vec/item(which yieldSendvalues) and move that.- GPU work is single-stream serialized per thread — the internal
default-stream is per-thread and maps to one Metal command queue per
thread. mlxrs exposes a public
Stream/DeviceAPI; noteStreamis a thread-affine, non-RAII handle: it is!Send + !Sync,Dropfrees only the mlx-c handle box (mlx has no per-stream teardown), andStream::new_onpermanently grows mlx's process-global stream state — so allocate a bounded set at startup, never per request/task. The only reclaim path is the bulk, end-of-threadStream::clear_current_thread_streams()(a worker's last mlx action before it exits), not per-value lifetime control. - Async Metal kernel failures bypass
Resultand abort the process — the rc/sentinel chain only catches synchronous errors. Aset_terminate-style recovery shim is not implementable (mlx-c exposes no hook); only diagnostics are planned. - Each thread that calls into mlxrs allocates a GPU stream that lives until process exit. mlxrs is designed for a small, long-lived worker pool — patterns that spawn a fresh OS thread per request will accumulate streams without bound.
- Operator overloads (
&a + &b,-&a, etc.) are gated behind theunstable-ops-overloadfeature flag and panic on shape/dtype error. Library authors must NEVER enable it transitively (Cargo features unionize across the dep graph). End-user binaries may opt in for prototyping. The falliblea.add(&b)?form is always available and is the load-bearing API. - Per-model architectures for the
lm/vlm/audio/embeddingsfeatures are added per-usecase rather than bulk-ported from the upstream Python projects. These features ship the support surface (loaders, tokenizers, pooling, generation loops, processors, audio I/O) — not the model implementations.
License
mlxrs is under the terms of both the MIT license and the
Apache License (Version 2.0).
See LICENSE-APACHE, LICENSE-MIT for details.
Copyright (c) 2026 FinDIT Studio authors.