Expand description
§minarrow-pyo3 - PyO3 Bindings for MinArrow
Zero-copy Python bindings for MinArrow via the Arrow C Data Interface and PyCapsules.
This crate provides transparent wrapper types that enable zero-copy conversion between MinArrow’s Rust types and PyArrow’s Python types.
§Features
- Zero-copy data transfer via Arrow C Data Interface
- Transparent wrappers (
PyArray,PyRecordBatch) implementing PyO3 traits - Idiomatic Rust API for building Python extensions
§Copy Semantics
§Zero-copy
All primary data buffers are transferred without copying in both directions.
This applies to all export paths, single array imports, ChunkedArray chunk
imports, and RecordBatch/Table column imports via both the PyCapsule stream
and legacy _import_from_c paths.
§Copied by design
The following are copied during import because they require structural transformation between MinArrow and Arrow representations:
- Null bitmasks — reconstructed into MinArrow’s
Bitmasktype on import. These are small: ceil(N/8) bytes for N elements. - String offsets — reconstructed into MinArrow’s offset representation.
- Categorical dictionary strings — Arrow stores dictionaries as contiguous
offsets+data; MinArrow stores them as
Vec64<String>with individual heap allocations. The integer codes buffer is zero-copy. - Field metadata — names, types, and flags are lightweight and always copied.
§Type Mappings
Minarrow calls an object with a header, rows and columns a ‘Table’ favouring broader matter-of-factness. Apache Arrow calls it a ‘RecordBatch’ in line with the Apache Arrow standard, whereby a ‘Table’ (at least in PyArrow), is considered a chunked composition of those RecordBatches, for a more highly engineered approach. Below is how they map to one another for the equivalent memory and object layout.
| MinArrow | PyArrow | Wrapper Type |
|---|---|---|
Array | pa.Array | PyArray |
Table | pa.RecordBatch | PyRecordBatch |
SuperTable | pa.Table | PyTable |
SuperArray | pa.ChunkedArray | PyChunkedArray |
§Conversion Protocols
Two protocols are supported for data exchange:
-
Arrow PyCapsule Interface - the standard
__arrow_c_array__/__arrow_c_stream__protocol. Works with any Arrow-compatible Python library including PyArrow, Polars, DuckDB, nanoarrow, and pandas with ArrowDtype. -
Legacy
_export_to_c- PyArrow-specific fallback using raw pointer integers.
Import functions try the PyCapsule protocol first, falling back to the legacy approach for older PyArrow versions.
For the complete array data type mapping including numeric, temporal, boolean, text,
and categorical types, see the ffi module documentation.
§Example
use minarrow_pyo3::{PyArray, PyRecordBatch};
use pyo3::prelude::*;
#[pyfunction]
fn process_batch(input: PyRecordBatch) -> PyResult<PyRecordBatch> {
let table: minarrow::Table = input.into();
// Process the table using MinArrow...
Ok(PyRecordBatch::from(table))
}
#[pymodule]
fn my_extension(m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_function(wrap_pyfunction!(process_batch, m)?)?;
Ok(())
}In Python:
import pyarrow as pa
import my_extension
batch = pa.RecordBatch.from_pydict({"a": [1, 2, 3], "b": [4.0, 5.0, 6.0]})
result = my_extension.process_batch(batch)Re-exports§
pub use error::PyMinarrowError;pub use error::PyMinarrowResult;pub use types::PyArray;pub use types::PyChunkedArray;pub use types::PyField;pub use types::PyRecordBatch;pub use types::PyTable;
Modules§
- error
- Error Module for minarrow-pyo3
- ffi
- FFI Module for minarrow-pyo3
- types
- Type Wrappers for minarrow-pyo3
Structs§
- Field
- Field
- Field
Array - FieldArray
- Super
Array - SuperArray
- Super
Table - SuperTable
- Table
- Table
Enums§
- Array
- Array
- Numeric
Array - NumericArray
- Text
Array - TextArray
Traits§
- Masked
Array - MaskedArray