coreml-native
Safe, ergonomic Rust bindings for Apple CoreML inference with full Apple Neural Engine (ANE) acceleration. Built on objc2-core-ml — no C bridge, no Swift runtime, pure Rust.
Features
- Load compiled
.mlmodelcmodels with configurable compute units (CPU/GPU/ANE) - Zero-copy tensors from Rust slices via
MLMultiArray::initWithDataPointer - Predict with named inputs/outputs, automatic Float16-to-Float32 conversion
- Async APIs —
load_async,predict_async,compile_model_asyncwith runtime-agnosticCompletionFuture - Batch prediction — submit multiple inputs in one call via
BatchProvider - Model lifecycle —
ModelHandlewith type-safeunload/reloadfor GPU/ANE memory management - 9 tensor data types — Float16, Float32, Float64, Int32, Int16, Int8, UInt32, UInt16, UInt8
- Introspect model inputs, outputs, shapes, data types, shape constraints, and metadata
- Stateful prediction via
MLStatefor RNN decoders and KV-cache (macOS 15+) - Compile
.mlmodel/.mlpackageto.mlmodelcat runtime (sync and async) - Device enumeration — discover available CPU, GPU, and Neural Engine devices
- ndarray integration — optional feature flag for zero-copy
ndarray↔ tensor conversion - Stride-aware output copy — correctly handles non-contiguous GPU/ANE tensor layouts
- Thread-safe —
Model,Prediction, and tensor types areSend + Sync - Cross-platform — compiles on Linux/Windows with stub types (no-op)
- No external toolchain — pure
cargo build, no Xcode project needed
Quick Start
use ;
let model = load?;
let data = vec!;
let tensor = from_f32?;
let prediction = model.predict?;
let = prediction.get_f32?;
println!;
Async Loading & Prediction
CompletionFuture works with any async runtime (tokio, async-std, smol) or can be blocked on synchronously:
use ;
// Async load — .await or .block_on()
let model = load_async?
.block_on?;
// Async prediction
let data = vec!;
let tensor = from_f32?;
let prediction = model.predict_async?
.block_on?;
// Load from in-memory bytes (macOS 14.4+)
let spec_bytes = read?;
let model = load_from_bytes?
.block_on?;
Batch Prediction
More efficient than calling predict() in a loop — CoreML optimizes across the batch:
use ;
let model = load?;
let data_a = vec!;
let data_b = vec!;
let tensor_a = from_f32?;
let tensor_b = from_f32?;
let inputs: = vec!;
let input_refs: =
inputs.iter.map.collect;
let batch = new?;
let results = model.predict_batch?;
for i in 0..results.count
Model Lifecycle Management
ModelHandle uses move semantics to prevent use-after-unload at compile time:
use ;
let handle = load?;
// Use the model
let prediction = handle.predict?;
// Free GPU/ANE memory when idle
let handle = handle.unload?;
assert!;
// Reload when needed — same path and compute config preserved
let handle = handle.reload?;
Tensor Types
BorrowedTensor — zero-copy from Rust slices
use BorrowedTensor;
// Supported: f32, i32, f64, f16 (as u16 bits), i16, i8, u32, u16, u8
let tensor = from_f32?;
let tensor = from_i32?;
let tensor = from_f16_bits?;
OwnedTensor — CoreML-allocated memory
use ;
let tensor = zeros?;
let data = tensor.to_vec_f32?; // copy out as Vec<f32>
let data = tensor.to_vec_i32?; // copy out as Vec<i32>
let bytes = tensor.to_raw_bytes?; // raw byte copy
Output Extraction
// Allocating
let = prediction.get_f32?;
let = prediction.get_i32?;
let = prediction.get_f64?;
let = prediction.get_raw?;
// Zero-alloc — reuse a buffer across predictions
let mut buf = vec!;
let shape = prediction.get_f32_into?;
Model Introspection
let model = load?;
for input in model.inputs
let meta = model.metadata;
println!;
Device Enumeration
use ;
for device in available_devices
Stateful Prediction (macOS 15+)
For RNN decoders, KV-cache models, and other stateful architectures:
let model = load?;
let state = model.new_state?;
// State persists across calls
let pred1 = model.predict_stateful?;
let pred2 = model.predict_stateful?;
Runtime Compilation
use compile_model;
// Sync
let compiled_path = compile_model?;
// Async (macOS 14.4+)
let compiled_path = compile_model_async?
.block_on?;
ndarray Integration
Enable with the ndarray feature flag:
[]
= { = "0.2", = ["ndarray"] }
use array;
use PredictionNdarray;
// ndarray → tensor (zero-copy, must be standard layout)
let input = array!.into_dyn;
let tensor = from_ndarray_f32?;
// prediction → ndarray
let output = prediction.get_ndarray_f32?;
println!;
Requirements
- macOS 12+ (Monterey) for core features
- macOS 14+ for async prediction
- macOS 14.4+ for
load_from_bytesandcompile_model_async - macOS 15+ for stateful prediction (
MLState) - Apple Silicon recommended for ANE acceleration
- Rust 1.75+
Comparison
| Crate | Approach | ANE | Standalone | Maintained |
|---|---|---|---|---|
| coreml-native | objc2 bindings | Full | Yes | Yes |
objc2-core-ml |
Raw auto-gen | Full | Yes* | Yes |
coreml-rs (swarnimarun) |
Swift bridge | Yes | No (Swift runtime) | Minimal |
candle-coreml |
objc2 + Candle | Yes | No (Candle dep) | Yes |
ort CoreML EP |
ONNX Runtime | Partial | Yes | Yes |
*Raw unsafe API without ergonomic wrappers.
License
Apache-2.0 OR MIT