arrow_extendr
arrow-extendr is a crate that facilitates the transfer of Apache Arrow memory between R and Rust. It utilizes extendr, the {nanoarrow} R package, and arrow-rs.
Motivating Example
Say we have the following DBI connection which we will send requests to using arrow.
The result of dbGetQueryArrow() is a nanoarrow_array_stream. We want to
count the number of rows in each batch of the steam using Rust.
# adapted from https://github.com/r-dbi/DBI/blob/main/vignettes/DBI-arrow.Rmd
con <-
data <-
We can write an extendr function which creates an ArrowArrayStreamReader
from an &Robj. In the function we instantiate a counter to keep track
of the number of rows per chunk. For each chunk we print the number of rows.
use *;
use FromArrowRobj;
use ArrowArrayStreamReader;
/// @export
With this function we can use it on the output of dbGetQueryArrow() or other Arrow
related DBI functions.
query <-
#> Processing `ArrowArrayStreamReader`...
#> Found 256 rows
#> Found 256 rows
#> Found 256 rows
#> ... truncated ...
#> Found 256 rows
#> Found 256 rows
#> Found 143 rows
#> [1] 2959
Polars interop
arrow-extendr provides optional interop with Polars via the polars feature flag. Add to your Cargo.toml:
= { = "58.0.0", = ["polars"], = false }
= "0.53.0"
= "1"
This enables the following conversions via the Arrow C Stream interface:
| Type | Direction | R object |
|---|---|---|
polars_core::frame::DataFrame |
IntoArrowRobj |
nanoarrow_array_stream |
polars_core::frame::DataFrame |
FromArrowRobj |
nanoarrow_array_stream |
polars_arrow::ffi::ArrowArrayStream |
IntoArrowRobj |
nanoarrow_array_stream |
polars_arrow::ffi::ArrowArrayStreamReader |
FromArrowRobj |
nanoarrow_array_stream |
Example: round-trip a Polars DataFrame through R
use *;
use anyhow;
use ;
use DataFrame;
/// @export
df <-
stream <-
Using arrow-extendr in a package
To use arrow-extendr in an R package first create an R package and make it an extendr package with:
usethis::
rextendr::;
Next, you have to ensure that nanoarrow is a dependency of the package since arrow-extendr will call functions from nanoarrow to convert between R and Arrow memory. To do this run usethis::use_package("nanoarrow") to add it to your Imports field in the DESCRIPTION file.
Versioning
At present, versions of arrow-rs are not compatible with each other. This means if your crate uses arrow-rs version 48.0.1, then the arrow-extendr must also use that same version. As such, arrow-extendr uses the same versions as arrow-rs so that it is easy to match the required versions you need.
Versions:
- 58.0.0
- 55.1.0
- 54.0.0
- 53.0.0
- 52.0.0
- 51.0.0
- 50.0.0 (compatible with geoarrow-rs 0.1.0)
- 49.0.0-geoarrow (not available on crates.io but is the current Git version)
- 48.0.1
- 49.0.0