Pass Arrow objects from and to PyArrow, using Arrow's C Data Interface and pyo3.
For underlying implementation, see the [ffi] module.
One can use these to write Python functions that take and return PyArrow objects, with automatic conversion to corresponding arrow-rs types.
#[pyfunction]
fn double_array(array: PyArrowType<ArrayData>) -> PyResult<PyArrowType<ArrayData>> {
let array = array.0; // Extract from PyArrowType wrapper
let array: Arc<dyn Array> = make_array(array); // Convert ArrayData to ArrayRef
let array: &Int32Array = array.as_any().downcast_ref()
.ok_or_else(|| PyValueError::new_err("expected int32 array"))?;
let array: Int32Array = array.iter().map(|x| x.map(|x| x * 2)).collect();
Ok(PyArrowType(array.into_data()))
}
| pyarrow type | arrow-rs type |
|---|---|
pyarrow.DataType |
[DataType] |
pyarrow.Field |
[Field] |
pyarrow.Schema |
[Schema] |
pyarrow.Array |
[ArrayData] |
pyarrow.RecordBatch |
[RecordBatch] |
pyarrow.RecordBatchReader |
[ArrowArrayStreamReader] / Box<dyn RecordBatchReader + Send> (1) |
(1) pyarrow.RecordBatchReader can be imported as [ArrowArrayStreamReader]. Either
[ArrowArrayStreamReader] or Box<dyn RecordBatchReader + Send> can be exported
as pyarrow.RecordBatchReader. (Box<dyn RecordBatchReader + Send> is typically
easier to create.)
PyArrow has the notion of chunked arrays and tables, but arrow-rs doesn't
have these same concepts. A chunked table is instead represented with
Vec<RecordBatch>. A pyarrow.Table can be imported to Rust by calling
pyarrow.Table.to_reader()
and then importing the reader as a [ArrowArrayStreamReader].