Skip to main content

Crate ferray_numpy_interop

Crate ferray_numpy_interop 

Source
Expand description

§ferray-numpy-interop

A companion crate providing owning conversions between ferray arrays and external array ecosystems:

  • NumPy (via PyO3) — feature "python"
  • Apache Arrow — feature "arrow"
  • Polars — feature "polars"

All three backends are feature-gated and disabled by default. Enable them in your Cargo.toml:

[dependencies.ferray-numpy-interop]
version = "0.1"
features = ["arrow"]  # or "python", "polars"

§Memory semantics

Every conversion in this crate currently copies the data buffer. The previous documentation claimed “zero-copy where possible”, but in practice all six conversion paths (NumPy / Arrow / Polars × both directions) allocate a new buffer and memcpy the elements:

PathReason
NumPy → ferrayPyReadonlyArray::iter().cloned().collect()
ferray → NumPyArray::to_vec_flat() then from_vec
Arrow ↔ ferrayPrimitiveArray::values() cloned into Vec
Polars ↔ ferrayChunkedArrayVec<T> via per-chunk copy

True zero-copy ferray↔NumPy would require ferray arrays to share the raw buffer with a Python-owned PyArray (refcount handshake plus pinning), which is a significant design change. Zero-copy to Arrow would require ferray arrays to expose their backing buffer as an arrow::buffer::Buffer with a compatible Drop hook. Both are tracked as potential follow-ups; for now the crate provides a correct, allocation-aware API that clearly acknowledges the copy.

The copies are usually still cheap enough for interop boundaries — they are a single memcpy per conversion, not per element — but callers on hot paths should prefer to stay inside one ecosystem.

§Design principles

  1. Safety first — every conversion validates dtypes and memory layout before returning. No silent reinterpretation of memory.
  2. Honest about allocation — see the table above. The docstrings on individual functions say “copy” explicitly.
  3. Explicit errors — dtype mismatches, null values, and unsupported types produce clear FerrayError messages.

Modules§

dtype_map
Mapping between ferray [DType], Arrow [DataType], and NumPy dtype codes.