Skip to main content

Crate minarrow_pyo3

Crate minarrow_pyo3 

Source
Expand description

§minarrow-pyo3 - PyO3 Bindings for MinArrow

Zero-copy Python bindings for MinArrow via the Arrow C Data Interface and PyCapsules.

This crate provides transparent wrapper types that enable zero-copy conversion between MinArrow’s Rust types and PyArrow’s Python types.

§Features

  • Zero-copy data transfer via Arrow C Data Interface
  • Transparent wrappers (PyArray, PyRecordBatch) implementing PyO3 traits
  • Idiomatic Rust API for building Python extensions

§Copy Semantics

§Zero-copy

All primary data buffers are transferred without copying in both directions. This applies to all export paths, single array imports, ChunkedArray chunk imports, and RecordBatch/Table column imports via both the PyCapsule stream and legacy _import_from_c paths.

§Copied by design

The following are copied during import because they require structural transformation between MinArrow and Arrow representations:

  • Null bitmasks — reconstructed into MinArrow’s Bitmask type on import. These are small: ceil(N/8) bytes for N elements.
  • String offsets — reconstructed into MinArrow’s offset representation.
  • Categorical dictionary strings — Arrow stores dictionaries as contiguous offsets+data; MinArrow stores them as Vec64<String> with individual heap allocations. The integer codes buffer is zero-copy.
  • Field metadata — names, types, and flags are lightweight and always copied.

§Type Mappings

Minarrow calls an object with a header, rows and columns a ‘Table’ favouring broader matter-of-factness. Apache Arrow calls it a ‘RecordBatch’ in line with the Apache Arrow standard, whereby a ‘Table’ (at least in PyArrow), is considered a chunked composition of those RecordBatches, for a more highly engineered approach. Below is how they map to one another for the equivalent memory and object layout.

MinArrowPyArrowWrapper Type
Arraypa.ArrayPyArray
Tablepa.RecordBatchPyRecordBatch
SuperTablepa.TablePyTable
SuperArraypa.ChunkedArrayPyChunkedArray

§Conversion Protocols

Two protocols are supported for data exchange:

  1. Arrow PyCapsule Interface - the standard __arrow_c_array__ / __arrow_c_stream__ protocol. Works with any Arrow-compatible Python library including PyArrow, Polars, DuckDB, nanoarrow, and pandas with ArrowDtype.

  2. Legacy _export_to_c - PyArrow-specific fallback using raw pointer integers.

Import functions try the PyCapsule protocol first, falling back to the legacy approach for older PyArrow versions.

For the complete array data type mapping including numeric, temporal, boolean, text, and categorical types, see the ffi module documentation.

§Example

use minarrow_pyo3::{PyArray, PyRecordBatch};
use pyo3::prelude::*;

#[pyfunction]
fn process_batch(input: PyRecordBatch) -> PyResult<PyRecordBatch> {
    let table: minarrow::Table = input.into();
    // Process the table using MinArrow...
    Ok(PyRecordBatch::from(table))
}

#[pymodule]
fn my_extension(m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(process_batch, m)?)?;
    Ok(())
}

In Python:

import pyarrow as pa
import my_extension

batch = pa.RecordBatch.from_pydict({"a": [1, 2, 3], "b": [4.0, 5.0, 6.0]})
result = my_extension.process_batch(batch)

Re-exports§

pub use error::PyMinarrowError;
pub use error::PyMinarrowResult;
pub use types::PyArray;
pub use types::PyChunkedArray;
pub use types::PyField;
pub use types::PyRecordBatch;
pub use types::PyTable;

Modules§

error
Error Module for minarrow-pyo3
ffi
FFI Module for minarrow-pyo3
types
Type Wrappers for minarrow-pyo3

Structs§

Field
Field
FieldArray
FieldArray
SuperArray
SuperArray
SuperTable
SuperTable
Table
Table

Enums§

Array
Array
NumericArray
NumericArray
TextArray
TextArray

Traits§

MaskedArray
MaskedArray