Expand description
§serde_arrow
- convert sequences Rust objects to / from arrow arrays
The arrow in-memory format is a powerful way to work with data frame like
structures. However, the API of the underlying Rust crates can be at times
cumbersome to use due to the statically typed nature of Rust. serde_arrow
,
offers a simple way to convert Rust objects into Arrow arrays and back.
serde_arrow
relies on Serde to interpret Rust objects.
Therefore, adding support for serde_arrow
to custom types is as easy as
using Serde’s derive macros.
In the Rust ecosystem there are two competing implementations of the arrow
in-memory format, arrow
and
arrow2
. serde_arrow
supports
both. The supported arrow implementations can be selected via
features.
serde_arrow
relies on a schema to translate between Rust and Arrow as
their type systems do not directly match. The schema is expressed as a
collection of Arrow fields with additional metadata describing the arrays.
E.g., to convert Rust strings containing timestamps to Date64 arrays, the
schema should contain a Date64
. serde_arrow
supports to derive the
schema from the data itself via schema tracing, but does not require it. It
is always possible to specify the schema manually. See the schema
module and SchemaLike
for further details.
§Overview
Operation | arrow-* | arrow2-* |
---|---|---|
Rust to Arrow | to_record_batch , to_arrow | to_arrow2 |
Arrow to Rust | from_record_batch , from_arrow | from_arrow2 |
Array Builder | ArrayBuilder::from_arrow | ArrayBuilder::from_arrow2 |
Serializer | ArrayBuilder::from_arrow + Serializer::new | ArrayBuilder::from_arrow2 + Serializer::new |
Deserializer | Deserializer::from_record_batch , Deserializer::from_arrow | Deserializer::from_arrow2 |
See also:
- the quickstart guide for more examples of how to use this package
- the status summary for an overview over the supported Arrow and Rust constructs
§arrow
Example
use arrow::datatypes::FieldRef;
use serde_arrow::schema::{SchemaLike, TracingOptions};
#[derive(Serialize, Deserialize)]
struct Record {
a: f32,
b: i32,
}
let records = vec![
Record { a: 1.0, b: 1 },
Record { a: 2.0, b: 2 },
Record { a: 3.0, b: 3 },
];
// Determine Arrow schema
let fields = Vec::<FieldRef>::from_type::<Record>(TracingOptions::default())?;
// Build the record batch
let batch = serde_arrow::to_record_batch(&fields, &records)?;
The RecordBatch
can then be written to disk, e.g., as parquet using
the ArrowWriter
from the parquet
crate.
§arrow2
Example
Requires one of arrow2
feature (see below).
use arrow2::datatypes::Field;
use serde_arrow::schema::{SchemaLike, TracingOptions};
#[derive(Serialize, Deserialize)]
struct Record {
a: f32,
b: i32,
}
let records = vec![
Record { a: 1.0, b: 1 },
Record { a: 2.0, b: 2 },
Record { a: 3.0, b: 3 },
];
let fields = Vec::<Field>::from_type::<Record>(TracingOptions::default())?;
let arrays = serde_arrow::to_arrow2(&fields, &records)?;
The generated arrays can then be written to disk, e.g., as parquet:
use arrow2::{chunk::Chunk, datatypes::Schema};
// see https://jorgecarleitao.github.io/arrow2/io/parquet_write.html
write_chunk(
"example.pq",
Schema::from(fields),
Chunk::new(arrays),
)?;
§Features:
The version of arrow
or arrow2
used can be selected via features. Per
default no arrow implementation is used. In that case only the base features
of serde_arrow
are available.
The arrow-*
and arrow2-*
feature groups are compatible with each other.
I.e., it is possible to use arrow
and arrow2
together. Within each group
the highest version is selected, if multiple features are activated. E.g,
when selecting arrow2-0-16
and arrow2-0-17
, arrow2=0.17
will be used.
Available features:
Arrow Feature | Arrow Version |
---|---|
arrow-53 | arrow=53 |
arrow-52 | arrow=52 |
arrow-51 | arrow=51 |
arrow-50 | arrow=50 |
arrow-49 | arrow=49 |
arrow-48 | arrow=48 |
arrow-47 | arrow=47 |
arrow-46 | arrow=46 |
arrow-45 | arrow=45 |
arrow-44 | arrow=44 |
arrow-43 | arrow=43 |
arrow-42 | arrow=42 |
arrow-41 | arrow=41 |
arrow-40 | arrow=40 |
arrow-39 | arrow=39 |
arrow-38 | arrow=38 |
arrow-37 | arrow=37 |
arrow2-0-17 | arrow2=0.17 |
arrow2-0-16 | arrow2=0.16 |
Modules§
- Internal. Do not use
- The mapping between Rust and Arrow types
- Helpers that may be useful when using
serde_arrow
Structs§
- Construct arrays by pushing individual records
- A structure to deserialize Arrow arrays into Rust objects
- Wrap an
ArrayBuilder
with as a Serializer
Enums§
- Common errors during
serde_arrow
’s usage
Functions§
- Deserialize items from arrow arrays (requires one of the
arrow-*
features) - Deserialize items from the given arrow2 arrays (requires one of the
arrow2-*
features) - Deserialize items from a record batch (requires one of the
arrow-*
features) - Build arrow arrays from the given items (requires one of the
arrow-*
features) - Build arrow2 arrays from the given items (requires one of the
arrow2-*
features) - Build a record batch from the given items (requires one of the
arrow-*
features)
Type Aliases§
- A Result type that defaults to
serde_arrow
’s Error type