arrow-ml
High-performance machine learning kernels built on Apache Arrow, written in Rust.
arrow-ml provides optimized tensor operations, linear algebra primitives, and neural network building blocks that operate directly on Arrow's in-memory format. It features a tiered execution strategy — naive loops for small inputs, SIMD for medium workloads, and pluggable GPU backends for large computations.
Features
- 40+ operations covering linear algebra, activations, normalization, convolution, pooling, and more
- Arrow-native — works directly with
arrow::tensor::TensorandPrimitiveArray, zero-copy where possible - Tiered dispatch — automatically selects between naive, SIMD (
portable_simd), and GPU paths based on input size - Pluggable GPU backends — ships with an Apple Metal backend; extensible via a C ABI plugin system
- Type-generic — optimized fast paths for
f32/f64, generic fallback for all Arrow numeric types - Null-propagating — correctly handles nullable Arrow arrays throughout
Crate Structure
arrow-ml # Unified public API with TensorOps / ArrayOps traits
├── arrow-ml-linalg # Linear algebra, reductions, reshaping, conv, pooling
├── arrow-ml-activations # Activation functions (ReLU, GELU, Sigmoid, etc.)
├── arrow-ml-common # Shared error types & backend plugin registry
└── arrow-ml-backend-metal # Metal GPU backend (macOS, cdylib)
Quick Start
Add to your Cargo.toml:
[]
= "0.1"
= { = ">=56, <59", = false }
arrow-ml is tested against arrow 56–58. Pick whichever version in that
range fits the rest of your dependency tree.
Tensor Operations
use ;
use Float32Type;
use Tensor;
use TensorOps;
// Create 2x3 and 3x2 tensors
let a_buf = from;
let a = new_row_major.unwrap;
let b_buf = from;
let b = new_row_major.unwrap;
// Matrix multiply, transpose, then sum along last axis
let result = a.?
.?
.?;
Activation Functions
use Float32Array;
use ArrayOps;
let input = from;
let activated = input.relu; // [0.0, 0.0, 1.0, null]
let gelu = input.gelu; // GELU approximation
let sig = input.sigmoid; // Sigmoid
Direct Kernel Calls
use matmul;
use layer_norm;
use conv2d;
let c = matmul?; // C = A * B
Operations
Linear Algebra
matmul, gemm, matvec, gemv, dot, axpy, scal
Activations
relu, leaky_relu, gelu, gelu_exact, sigmoid, tanh, silu, softmax
Normalization
layer_norm, rms_norm, batch_norm, group_norm, instance_norm, l1_norm, l2_norm
Reductions
reduce_sum, reduce_mean, reduce_max, reduce_min, reduce_prod, cumsum, argmax, argmin, topk
Tensor Manipulation
reshape, flatten, squeeze, unsqueeze, expand, transpose, concat, gather, gather_elements, scatter_nd, pad
Convolution & Pooling
conv2d, conv_transpose2d, avg_pool2d, max_pool2d, resize
Element-wise Math
pow, erf, reciprocal, cos, sin, floor, ceil, round, clip
Other
embedding, where_cond
GPU Backend
On macOS, the Metal backend accelerates matmul for large matrices automatically. The dispatch threshold is configurable but defaults to 256×256.
The backend plugin system discovers backends at runtime via two parallel mechanisms:
- JSON manifests (Vulkan-ICD style) —
*.jsonfiles in~/.arrow-ml/backends/,/etc/arrow-ml/backends/, or$ARROW_ML_BACKEND_MANIFEST_DIR. A reference manifest ships atcrates/arrow-ml-backend-metal/manifests/metal.json. - Glob fallback — shared libraries matching
libarrow_ml_backend_*sitting next to the running executable, in$ARROW_ML_BACKEND_DIR, or in the workspacetarget/{debug,release}during development.
To build the Metal backend:
Custom backends can be added by implementing the C ABI contract — every
backend must export am_backend_abi_version, am_backend_name, and
am_backend_priority. See arrow_ml_common::backend for the full
function-pointer types and the current ARROW_ML_BACKEND_ABI_VERSION.
Benchmarks
Benchmarks cover matmul performance across sizes (16–4096) for both f32 and f64, comparing naive vs SIMD vs GPU paths.
Requirements
- Rust nightly (uses
portable_simd) - Arrow 56–58
- macOS for Metal GPU backend (optional)
License
Apache-2.0