arrow-ml

High-performance machine learning kernels built on Apache Arrow, written in Rust.

arrow-ml provides optimized tensor operations, linear algebra primitives, and neural network building blocks that operate directly on Arrow's in-memory format. It features a tiered execution strategy — naive loops for small inputs, SIMD for medium workloads, and pluggable GPU backends for large computations.

Features

40+ operations covering linear algebra, activations, normalization, convolution, pooling, and more
Arrow-native — works directly with arrow::tensor::Tensor and PrimitiveArray, zero-copy where possible
Tiered dispatch — automatically selects between naive, SIMD (portable_simd), and GPU paths based on input size
Pluggable GPU backends — ships with an Apple Metal backend; extensible via a C ABI plugin system
Type-generic — optimized fast paths for f32/f64, generic fallback for all Arrow numeric types
Null-propagating — correctly handles nullable Arrow arrays throughout

Crate Structure

arrow-ml              # Unified public API with TensorOps / ArrayOps traits
├── arrow-ml-linalg         # Linear algebra, reductions, reshaping, conv, pooling
├── arrow-ml-activations    # Activation functions (ReLU, GELU, Sigmoid, etc.)
├── arrow-ml-common         # Shared error types & backend plugin registry
└── arrow-ml-backend-metal  # Metal GPU backend (macOS, cdylib)

Quick Start

Add to your Cargo.toml:

[dependencies]
arrow-ml = "0.1"
arrow = { version = ">=56, <59", default-features = false }

arrow-ml is tested against arrow 56–58. Pick whichever version in that range fits the rest of your dependency tree.

Tensor Operations

use arrow::buffer::{Buffer, ScalarBuffer};
use arrow::datatypes::Float32Type;
use arrow::tensor::Tensor;
use arrow_ml::tensor_ops::TensorOps;

// Create 2x3 and 3x2 tensors
let a_buf = Buffer::from(ScalarBuffer::<f32>::from(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0]).into_inner());
let a = Tensor::new_row_major(a_buf, Some(vec![2, 3]), None).unwrap();

let b_buf = Buffer::from(ScalarBuffer::<f32>::from(vec![7.0, 8.0, 9.0, 10.0, 11.0, 12.0]).into_inner());
let b = Tensor::new_row_major(b_buf, Some(vec![3, 2]), None).unwrap();

// Matrix multiply, transpose, then sum along last axis
let result = a.dot::<Float32Type>(&b)?
    .t::<Float32Type>()?
    .sum::<Float32Type>(&[-1], false)?;

Activation Functions

use arrow::array::Float32Array;
use arrow_ml::array_ops::ArrayOps;

let input = Float32Array::from(vec![Some(-1.0), Some(0.0), Some(1.0), None]);
let activated = input.relu();       // [0.0, 0.0, 1.0, null]
let gelu = input.gelu();            // GELU approximation
let sig = input.sigmoid();          // Sigmoid

Direct Kernel Calls

use arrow_ml_linalg::matmul::matmul;
use arrow_ml_linalg::layernorm::layer_norm;
use arrow_ml_linalg::conv::conv2d;

let c = matmul(&a, &b)?;            // C = A * B

Operations

Linear Algebra

matmul, gemm, matvec, gemv, dot, axpy, scal

Activations

relu, leaky_relu, gelu, gelu_exact, sigmoid, tanh, silu, softmax

Normalization

layer_norm, rms_norm, batch_norm, group_norm, instance_norm, l1_norm, l2_norm

Reductions

reduce_sum, reduce_mean, reduce_max, reduce_min, reduce_prod, cumsum, argmax, argmin, topk

Tensor Manipulation

reshape, flatten, squeeze, unsqueeze, expand, transpose, concat, gather, gather_elements, scatter_nd, pad

Convolution & Pooling

conv2d, conv_transpose2d, avg_pool2d, max_pool2d, resize

Element-wise Math

pow, erf, reciprocal, cos, sin, floor, ceil, round, clip

Other

embedding, where_cond

GPU Backend

On macOS, the Metal backend accelerates matmul for large matrices automatically. The dispatch threshold is configurable but defaults to 256×256.

The backend plugin system discovers backends at runtime via two parallel mechanisms:

JSON manifests (Vulkan-ICD style) — *.json files in ~/.arrow-ml/backends/, /etc/arrow-ml/backends/, or $ARROW_ML_BACKEND_MANIFEST_DIR. A reference manifest ships at crates/arrow-ml-backend-metal/manifests/metal.json.
Glob fallback — shared libraries matching libarrow_ml_backend_* sitting next to the running executable, in $ARROW_ML_BACKEND_DIR, or in the workspace target/{debug,release} during development.

To build the Metal backend:

cargo build --release -p arrow-ml-backend-metal

Custom backends can be added by implementing the C ABI contract — every backend must export am_backend_abi_version, am_backend_name, and am_backend_priority. See arrow_ml_common::backend for the full function-pointer types and the current ARROW_ML_BACKEND_ABI_VERSION.

Benchmarks

cargo bench -p arrow-ml-linalg

Benchmarks cover matmul performance across sizes (16–4096) for both f32 and f64, comparing naive vs SIMD vs GPU paths.

Requirements

Rust nightly (uses portable_simd)
Arrow 56–58
macOS for Metal GPU backend (optional)

License

Apache-2.0

arrow-ml-activations 0.1.0