arrow-ml-activations 0.1.0

# arrow-ml

High-performance machine learning kernels built on [Apache Arrow](https://arrow.apache.org/), written in Rust.

`arrow-ml` provides optimized tensor operations, linear algebra primitives, and neural network building blocks that operate directly on Arrow's in-memory format. It features a tiered execution strategy — naive loops for small inputs, SIMD for medium workloads, and pluggable GPU backends for large computations.

## Features

- **40+ operations** covering linear algebra, activations, normalization, convolution, pooling, and more
- **Arrow-native** — works directly with `arrow::tensor::Tensor` and `PrimitiveArray`, zero-copy where possible
- **Tiered dispatch** — automatically selects between naive, SIMD (`portable_simd`), and GPU paths based on input size
- **Pluggable GPU backends** — ships with an Apple Metal backend; extensible via a C ABI plugin system
- **Type-generic** — optimized fast paths for `f32`/`f64`, generic fallback for all Arrow numeric types
- **Null-propagating** — correctly handles nullable Arrow arrays throughout

## Crate Structure

```
arrow-ml              # Unified public API with TensorOps / ArrayOps traits
├── arrow-ml-linalg         # Linear algebra, reductions, reshaping, conv, pooling
├── arrow-ml-activations    # Activation functions (ReLU, GELU, Sigmoid, etc.)
├── arrow-ml-common         # Shared error types & backend plugin registry
└── arrow-ml-backend-metal  # Metal GPU backend (macOS, cdylib)
```

## Quick Start

Add to your `Cargo.toml`:

```toml
[dependencies]
arrow-ml = "0.1"
arrow = { version = ">=56, <59", default-features = false }
```

`arrow-ml` is tested against arrow 56–58. Pick whichever version in that
range fits the rest of your dependency tree.

### Tensor Operations

```rust
use arrow::buffer::{Buffer, ScalarBuffer};
use arrow::datatypes::Float32Type;
use arrow::tensor::Tensor;
use arrow_ml::tensor_ops::TensorOps;

// Create 2x3 and 3x2 tensors
let a_buf = Buffer::from(ScalarBuffer::<f32>::from(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0]).into_inner());
let a = Tensor::new_row_major(a_buf, Some(vec![2, 3]), None).unwrap();

let b_buf = Buffer::from(ScalarBuffer::<f32>::from(vec![7.0, 8.0, 9.0, 10.0, 11.0, 12.0]).into_inner());
let b = Tensor::new_row_major(b_buf, Some(vec![3, 2]), None).unwrap();

// Matrix multiply, transpose, then sum along last axis
let result = a.dot::<Float32Type>(&b)?
    .t::<Float32Type>()?
    .sum::<Float32Type>(&[-1], false)?;
```

### Activation Functions

```rust
use arrow::array::Float32Array;
use arrow_ml::array_ops::ArrayOps;

let input = Float32Array::from(vec![Some(-1.0), Some(0.0), Some(1.0), None]);
let activated = input.relu();       // [0.0, 0.0, 1.0, null]
let gelu = input.gelu();            // GELU approximation
let sig = input.sigmoid();          // Sigmoid
```

### Direct Kernel Calls

```rust
use arrow_ml_linalg::matmul::matmul;
use arrow_ml_linalg::layernorm::layer_norm;
use arrow_ml_linalg::conv::conv2d;

let c = matmul(&a, &b)?;            // C = A * B
```

## Operations

### Linear Algebra

`matmul`, `gemm`, `matvec`, `gemv`, `dot`, `axpy`, `scal`

### Activations

`relu`, `leaky_relu`, `gelu`, `gelu_exact`, `sigmoid`, `tanh`, `silu`, `softmax`

### Normalization

`layer_norm`, `rms_norm`, `batch_norm`, `group_norm`, `instance_norm`, `l1_norm`, `l2_norm`

### Reductions

`reduce_sum`, `reduce_mean`, `reduce_max`, `reduce_min`, `reduce_prod`, `cumsum`, `argmax`, `argmin`, `topk`

### Tensor Manipulation

`reshape`, `flatten`, `squeeze`, `unsqueeze`, `expand`, `transpose`, `concat`, `gather`, `gather_elements`, `scatter_nd`, `pad`

### Convolution & Pooling

`conv2d`, `conv_transpose2d`, `avg_pool2d`, `max_pool2d`, `resize`

### Element-wise Math

`pow`, `erf`, `reciprocal`, `cos`, `sin`, `floor`, `ceil`, `round`, `clip`

### Other

`embedding`, `where_cond`

## GPU Backend

On macOS, the Metal backend accelerates `matmul` for large matrices automatically. The dispatch threshold is configurable but defaults to 256×256.

The backend plugin system discovers backends at runtime via two parallel
mechanisms:

1. **JSON manifests** (Vulkan-ICD style) — `*.json` files in
   `~/.arrow-ml/backends/`, `/etc/arrow-ml/backends/`, or
   `$ARROW_ML_BACKEND_MANIFEST_DIR`. A reference manifest ships at
   `crates/arrow-ml-backend-metal/manifests/metal.json`.
2. **Glob fallback** — shared libraries matching `libarrow_ml_backend_*`
   sitting next to the running executable, in `$ARROW_ML_BACKEND_DIR`, or
   in the workspace `target/{debug,release}` during development.

To build the Metal backend:

```sh
cargo build --release -p arrow-ml-backend-metal
```

Custom backends can be added by implementing the C ABI contract — every
backend must export `am_backend_abi_version`, `am_backend_name`, and
`am_backend_priority`. See `arrow_ml_common::backend` for the full
function-pointer types and the current `ARROW_ML_BACKEND_ABI_VERSION`.

## Benchmarks

```sh
cargo bench -p arrow-ml-linalg
```

Benchmarks cover matmul performance across sizes (16–4096) for both `f32` and `f64`, comparing naive vs SIMD vs GPU paths.

## Requirements

- **Rust nightly** (uses `portable_simd`)
- Arrow 56–58
- macOS for Metal GPU backend (optional)

## License

Apache-2.0