Fast, minimal, and ergonomic Apache Arrow implementation in Rust, with PyO3 and Python bindings.
Highlights:
- < 2s compilation times (0.15s rebuilds)
- Rust to Python for 1m rows in under 200ns¹
- Guaranteed 64-byte SIMD alignment
- Fully-typed throughout
- Embed Python ML in Rust
- Plug in Polars, Arrow, and Python with Zero-copy FFI, PyCapsule and PYO3
Limitations:
- Tabular data only. No Arrow lists or structs.
Why Minarrow?
- Productive: Lean base dependencies help keep flow and library fast.
- Fully typed: Enum dispatch ensures constant feedback through a strong compiler + IDE feedback loop
- Fail fast: Errors that might slip through a dynamic dispatch boundary into run-time errors, stay compile-time.
- Ergonomics: Convenient dot syntax, composable opt-up abstractions, and flexible signatures via liberal
Fromtrait implementations. - Speed: builds on the
Vec64crate so data is fully SIMD compatible for an extra layer of low-latency parallelism, with out of the box numeric, bitmask and string SIMD kernels to help get you started. - Minimal Dependencies: Small security surface - built from foundations. Only
log,num-traits(and custom-builtVec64) in the base build.
Installation
Minarrow uses the nightly toolchain for allocator_api and portable_simd:
Quick Start
use ;
// Create arrays with macros
let ids = arr_i32!;
let prices = arr_f64!;
let names = arr_str32!;
let flags = arr_bool!;
// Direct typed access - no downcasting
assert_eq!;
assert_eq!;
use ;
// Build tables via FieldArrays with constructor macros
let table = tbl!;
table.print;
Core Features
Typed Arrays
Six typed arrays back standard workloads:
| Type | Description |
|---|---|
IntegerArray<T> |
i8 through u64 |
FloatArray<T> |
f32, f64 |
StringArray<T> |
UTF-8 with u32 or u64 offsets |
BooleanArray |
Bit-packed with validity mask |
CategoricalArray<T> |
Dictionary-encoded |
DatetimeArray<T> |
Timestamps, dates, durations |
Semantic groupings (NumericArray, TextArray, TemporalArray) support flexibility call-site signatures and static dispatch.
Array and Table (RecordBatch) sit on top, with chunked Super (Vec<Table>) versions for streaming.
Bonus LAPACK-compatible Matrix and Cube types support analytical workload variations.
Zero-Copy Views
Ergonomic zero-copy row and column selection.
use *;
let table = create_table;
// Pandas-style selection
let view = table.c; // columns
let view = table.r; // rows
let view = table.c.r; // both
// Materialise when needed
let owned = view.to_table;
Off-the-Wire pattern
use Arc;
use ;
// Append batches as they arrive
let mut stream = new;
stream.push;
stream.push;
// Assemble from existing batches
let stream = st!;
// Consolidate to a single table when ready
let table = stream.consolidate;
Arrow Interop
Embed Python in Rust. Bench in /examples.
// Run a Random Forest Classifier using Python in Rust
let value = rt.with_python?;
Normal Rust - use Polars, Arrow, convert at the boundary, stay native internally:
// To Arrow (feature: cast_arrow)
let batch = table.to_apache_arrow; // RecordBatch
let column = table.cols.to_apache_arrow; // ArrayRef
// To Polars (feature: cast_polars)
let df = table.to_polars; // DataFrame
// FFI via the Arrow C Data Interface
use export_to_c;
let = export_to_c;
Architecture
Minarrow uses enums for type dispatch instead of trait objects.
// Static dispatch routing, full compiler inlining
match array
// Get fully typed IntegerArray<i64> instead of Array:
let array = array.num.i64;
Benchmarks
Array performance
Sum of 1,000 integers, averaged over 1,000 runs¹:
| Implementation | Time |
|---|---|
Raw Vec<i64> |
85 ns |
Minarrow IntegerArray (direct) |
88 ns |
Minarrow IntegerArray (via enum) |
124 ns |
Arrow-rs Int64Array (struct) |
147 ns |
Arrow-rs Int64Array (dyn) |
181 ns |
Minarrow's direct access is within the noise threshold of raw Vec performance whilst maintaining SIMD-compatible alignment.
Python Roundtrip
Rust to Python¹ 1m rows, 2 columns : 165ns Python to Rust¹ 1m rows, 2 columns: 2.8μs
See minarrow-pyo3/examples
Test machine
▎ ¹ = Intel Core Ultra 7 155H · 32 GB · Ubuntu 24.04 · 1.97-nightly release build.
Feature Flags
Default features: views, chunked, large_string, simd, select.
| Feature | Description |
|---|---|
views |
Zero-copy windowed access |
chunked |
SuperArray/SuperTable for streaming |
large_string |
String arrays with 64-bit offsets |
simd |
SIMD kernels for Bitmask and arithmetic |
select |
Pandas-esque .c() / .r() selection |
Interop:
| Feature | Description |
|---|---|
cast_arrow |
Arrow-rs conversion via to_apache_arrow() |
cast_polars |
Polars conversion via to_polars() / from_polars() |
memfd |
Memfd-backed buffers for zero-copy cross-process sharing (Linux) |
Additional types:
| Feature | Description |
|---|---|
datetime |
Temporal array types as raw integer offsets |
datetime_ops |
Datetime library functionality: ISO 8601 parsing, timezone-aware operations, arithmetic, component extraction |
extended_numeric_types |
i8, i16, u8, u16 variants |
extended_categorical |
Categorical8/16/64 dictionary index widths |
scalar_type |
Unified Scalar for aggregation results |
value_type |
Catch-all Value enum for unified typing |
matrix |
2D matrix with BLAS/LAPACK-compatible layout |
cube |
Stacks tables along an extra axis |
shared_dict |
Shared source of truth for categorical dictionaries. |
Performance:
| Feature | Description |
|---|---|
parallel_proc |
Rayon parallel iterators |
fast_dict |
~30% faster shared_dict, at cost of 3 dependencies |
fast_hash |
Swaps hashing to ahash |
arena |
Bump allocator for bulk array and Table construction |
vmap64 |
Mmap-backed Vec64 on Linux |
lbuffer |
Atomically updated array source |
Extras:
| Feature | Description |
|---|---|
broadcast |
Typed arithmetic broadcasting |
str_arithmetic |
String arithmetic kernels for outlandish concatenation |
hash |
Hash and Eq for Scalar |
size |
Byte size estimation |
table_metadata |
Schema-level metadata map on Table |
See Cargo.toml for the full list with detailed notes on each.
Ecosystem
| Project | Purpose |
|---|---|
minarrow-py |
Minarrow Python bindings |
minarrow-pyo3 |
Zero-copy Python interop via PyArrow |
vec64 |
Custom 64-byte aligned Vec for SIMD compatible workloads |
lightstream |
Zero-copy Arrow streaming over Tokio, TCP, QUIC, WebSocket, Unix sockets, and Stdio |
| Lightning Analytics Engine | Sub-millisecond, zero-config live streaming engine with statistical modelling and data processing. |
Contributing
Contributions are welcome, particularly in the following areas:
- Nested types - List and Struct support
- Bug fixes
All contributions are subject to the Contributor Licence Agreement (CLA). See CONTRIBUTING.md for details.
License
Copyright © 2025–2026 Peter Garfield Bower.
Released under the Apache 2.0 License. See LICENSE for details.
Acknowledgements
Minarrow is a from-scratch implementation of the Apache Arrow memory layout inspired by the standards pioneered by Apache Arrow, Arrow2, and Polars.
Minarrow is not affiliated with Apache Arrow.
SpaceCell
Minarrow is maintained by SpaceCell and forms part of its open-source foundation for high-performance data computing.