Minarrow
A fast, minimal columnar data library for Rust with Arrow compatibility.
Minarrow gives you typed columnar arrays that compile in ~2 seconds, run with SIMD alignment, and convert to Arrow when you need interop. It keeps the common path concrete and lightweight, so iteration stays fast.
Why Minarrow?
Minarrow buys you productivity.
Problems:
- Compile times: Other Arrow Rust libraries are powerful but heavy, and build times can take minutes. Lean base dependencies keep yours fast.
- Type erasure: Other libraries use
dyn Array, where the concrete backing vector is hidden behind the trait object. The impact:- Reliability: A class of errors moves from compile-time to run-time, as the compiler can't see through the downcasting boundary.
- Ergonomics: You need to downcast to recover the types.
- Productivity: If you are using AI, it needs to iterate types against the compiler instead of leaving run-time downcast bombs. Let it work for you.
The solution: Minarrow keeps concrete types throughout. An IntegerArray<i64> stays fully typed through composable abstractions. You get direct access, ergonomics, IDE autocomplete, and fast compilation. When you need to talk to Arrow, Polars, or PyArrow, zero-copy conversion is one method call away.
Installation
Minarrow uses the nightly toolchain for allocator_api and portable_simd:
Quick Start
use ;
// Create arrays with macros
let ids = arr_i32!;
let prices = arr_f64!;
let names = arr_str32!;
let flags = arr_bool!;
// Direct typed access - no downcasting
assert_eq!;
assert_eq!;
use ;
// Build tables via FieldArrays with constructor macros
let table = tbl!;
table.print;
Core Features
Typed Arrays
Six typed arrays back standard workloads:
| Type | Description |
|---|---|
IntegerArray<T> |
i8 through u64 |
FloatArray<T> |
f32, f64 |
StringArray<T> |
UTF-8 with u32 or u64 offsets |
BooleanArray |
Bit-packed with validity mask |
CategoricalArray<T> |
Dictionary-encoded |
DatetimeArray<T> |
Timestamps, dates, durations |
Semantic groupings (NumericArray, TextArray, TemporalArray) let you write generic functions while keeping static dispatch.
Array and Table complete the picture, with chunked Super versions for streaming.
Bonus LAPACK-compatible Matrix and Cube types.
Fast Compilation
| Metric | Time |
|---|---|
| Clean build | < 2s |
| Incremental rebuild | < 0.15s |
Achieved through minimal dependencies: num-traits, vec64, and log, with optional rayon for parallelism.
SIMD Alignment
All buffers use 64-byte alignment via Vec64. There is no reallocation step to fix alignment: data is ready for vectorised operations from the moment it's created.
Zero-Copy Views
Select columns and rows without copying data:
use *;
let table = create_table;
// Pandas-style selection
let view = table.c; // columns
let view = table.r; // rows
let view = table.c.r; // both
// Materialise only when needed
let owned = view.to_table;
Streaming with SuperArrays
For streaming workloads, SuperArray and SuperTable hold multiple chunks with consistent schema:
use Arc;
use ;
// Append batches as they arrive
let mut stream = new;
stream.push;
stream.push;
// Or assemble from existing batches in one line
let stream = st!;
// Consolidate to a single table when ready
let table = stream.consolidate;
Arrow Interop
Convert at the boundary, stay native internally:
// To Arrow (feature: cast_arrow)
let batch = table.to_apache_arrow; // RecordBatch
let column = table.cols.to_apache_arrow; // ArrayRef
// To Polars (feature: cast_polars)
let df = table.to_polars; // DataFrame
// FFI via the Arrow C Data Interface
use export_to_c;
let = export_to_c;
Architecture
Minarrow uses enums for type dispatch instead of trait objects:
// Static dispatch, full inlining
match array
This gives you:
- Performance - Compiler inlines through the dispatch
- Type safety - No
Any, no runtime downcasts - Ergonomics - Direct accessors like
array.num().i64(), an Arc bump when the variant matches
Benchmarks
Sum of 1,000 integers, averaged over 1,000 runs (Intel Ultra 7 155H):
| Implementation | Time |
|---|---|
Raw Vec<i64> |
85 ns |
Minarrow IntegerArray (direct) |
88 ns |
Minarrow IntegerArray (via enum) |
124 ns |
Arrow-rs Int64Array (struct) |
147 ns |
Arrow-rs Int64Array (dyn) |
181 ns |
Minarrow's direct access matches raw Vec performance. Even through enum dispatch, it outperforms arrow-rs.
With SIMD + Rayon, summing 1 billion integers takes ~114ms.
Feature Flags
Default features: views, chunked, large_string, simd, select.
| Feature | Description |
|---|---|
views |
Zero-copy windowed access (default) |
chunked |
SuperArray/SuperTable for streaming (default) |
large_string |
String arrays with 64-bit offsets (default) |
simd |
SIMD kernels for Bitmask and arithmetic (default) |
select |
Pandas-style .c() / .r() selection (default) |
Interop:
| Feature | Description |
|---|---|
cast_arrow |
Arrow-rs conversion via to_apache_arrow() |
cast_polars |
Polars conversion via to_polars() / from_polars() |
memfd |
Memfd-backed buffers for zero-copy cross-process sharing (Linux) |
Additional types:
| Feature | Description |
|---|---|
datetime |
Temporal array types as raw integer offsets |
datetime_ops |
Full datetime functionality: ISO 8601 parsing, timezone-aware operations, arithmetic, component extraction |
extended_numeric_types |
i8, i16, u8, u16 variants |
extended_categorical |
Categorical8/16/64 dictionary index widths |
scalar_type |
Unified Scalar for aggregation results |
value_type |
Catch-all Value enum for engine-level orchestration |
matrix |
2D matrix with BLAS/LAPACK-compatible layout |
cube |
Stacks tables along an extra axis for time series and group analytics |
Performance:
| Feature | Description |
|---|---|
parallel_proc |
Rayon parallel iterators |
shared_dict |
Cross-batch shared categorical dictionaries |
fast_dict |
~30% faster shared_dict, at the cost of 3 dependencies |
fast_hash |
Swaps hashing to ahash |
arena |
Bump allocator for bulk array and Table construction |
vmap64 |
Mmap-backed Vec64 on Linux |
Extras:
| Feature | Description |
|---|---|
broadcast |
Typed arithmetic broadcasting |
str_arithmetic |
String arithmetic kernels, e.g. + concatenation |
hash |
Hash and Eq for Scalar |
size |
Byte size estimation |
table_metadata |
Schema-level metadata map on Table |
See Cargo.toml for the full list with detailed notes on each.
Ecosystem
| Project | Purpose |
|---|---|
minarrow-pyo3 |
Zero-copy Python interop via PyArrow |
lightstream |
Zero-copy Arrow streaming over Tokio, TCP, QUIC, WebSocket, Unix sockets, and Stdio |
vec64 |
64-byte aligned Vec for optimal SIMD |
| Lightning Analytics Engine | Sub-millisecond, zero-config live streaming engine with statistical modelling and data processing. Built on Minarrow |
Limitations
Minarrow focuses on flat columnar data, covering the common 80% of analytical workloads. Nested types (List, Struct) are not currently supported. If you need deeply nested schemas, arrow-rs is the better choice.
Contributing
Contributions are welcome, particularly in the following areas:
- Connectors – Data source and sink integrations
- Optimisations – Performance improvements
- Nested types – List and Struct support
- Bug fixes
All contributions are subject to the Contributor Licence Agreement (CLA). See CONTRIBUTING.md for details.
License
Copyright © 2025–2026 Peter Garfield Bower.
Released under the Apache 2.0 License. See LICENSE for details.
Acknowledgements
Minarrow is a from-scratch implementation of the Apache Arrow memory layout inspired by the standards pioneered by Apache Arrow, Arrow2, and Polars.
Minarrow is not affiliated with Apache Arrow.
Backed by SpaceCell
Minarrow is backed by SpaceCell, ensuring ongoing support for the project.
For an edge in high-performance data computing, visit spacecell.com