Crate minarrow_pyo3

Expand description

§minarrow-pyo3 - PyO3 Bindings for MinArrow

Zero-copy Python bindings for MinArrow via the Arrow C Data Interface and PyCapsules.

This crate provides transparent wrapper types that enable zero-copy conversion between MinArrow’s Rust types and PyArrow’s Python types.

§Features

Zero-copy data transfer via Arrow C Data Interface
Transparent wrappers (PyArray, PyRecordBatch) implementing PyO3 traits
Idiomatic Rust API for building Python extensions

§Copy Semantics

§Zero-copy

All primary data buffers are transferred without copying in both directions. This applies to all export paths, single array imports, ChunkedArray chunk imports, and RecordBatch/Table column imports via both the PyCapsule stream and legacy _import_from_c paths.

§Copied by design

The following are copied during import because they require structural transformation between MinArrow and Arrow representations:

Null bitmasks — reconstructed into MinArrow’s Bitmask type on import. These are small: ceil(N/8) bytes for N elements.
String offsets — reconstructed into MinArrow’s offset representation.
Categorical dictionary strings — Arrow stores dictionaries as contiguous offsets+data; MinArrow stores them as Vec64<String> with individual heap allocations. The integer codes buffer is zero-copy.
Field metadata — names, types, and flags are lightweight and always copied.

§Type Mappings

Minarrow calls an object with a header, rows and columns a ‘Table’ favouring broader matter-of-factness. Apache Arrow calls it a ‘RecordBatch’ in line with the Apache Arrow standard, whereby a ‘Table’ (at least in PyArrow), is considered a chunked composition of those RecordBatches, for a more highly engineered approach. Below is how they map to one another for the equivalent memory and object layout.

MinArrow	PyArrow	Wrapper Type
`Array`	`pa.Array`	`PyArray`
`Table`	`pa.RecordBatch`	`PyRecordBatch`
`SuperTable`	`pa.Table`	`PyTable`
`SuperArray`	`pa.ChunkedArray`	`PyChunkedArray`

§Conversion Protocols

Two protocols are supported for data exchange:

Arrow PyCapsule Interface - the standard __arrow_c_array__ / __arrow_c_stream__ protocol. Works with any Arrow-compatible Python library including PyArrow, Polars, DuckDB, nanoarrow, and pandas with ArrowDtype.
Legacy _export_to_c - PyArrow-specific fallback using raw pointer integers.

Import functions try the PyCapsule protocol first, falling back to the legacy approach for older PyArrow versions.

For the complete array data type mapping including numeric, temporal, boolean, text, and categorical types, see the ffi module documentation.

§Example

use minarrow_pyo3::{PyArray, PyRecordBatch};
use pyo3::prelude::*;

#[pyfunction]
fn process_batch(input: PyRecordBatch) -> PyResult<PyRecordBatch> {
    let table: minarrow::Table = input.into();
    // Process the table using MinArrow...
    Ok(PyRecordBatch::from(table))
}

#[pymodule]
fn my_extension(m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(process_batch, m)?)?;
    Ok(())
}

In Python:

import pyarrow as pa
import my_extension

batch = pa.RecordBatch.from_pydict({"a": [1, 2, 3], "b": [4.0, 5.0, 6.0]})
result = my_extension.process_batch(batch)

Re-exports§

pub use error::PyMinarrowError;
pub use error::PyMinarrowResult;
pub use types::PyArray;
pub use types::PyChunkedArray;
pub use types::PyField;
pub use types::PyRecordBatch;
pub use types::PyTable;

Modules§

error: Error Module for minarrow-pyo3
ffi: FFI Module for minarrow-pyo3
types: Type Wrappers for minarrow-pyo3

Structs§

Field: Field
FieldArray: FieldArray
SuperArray: SuperArray
SuperTable: SuperTable
Table: Table

Enums§

Array: Array
NumericArray: NumericArray
TextArray: TextArray

Traits§

MaskedArray: MaskedArray