Expand description
§serde_arrow - convert sequences Rust objects to / from arrow arrays
The arrow in-memory format is a powerful way to work with data frame like
structures. However, the API of the underlying Rust crates can be at times
cumbersome to use due to the statically typed nature of Rust. serde_arrow,
offers a simple way to convert Rust objects into Arrow arrays and back.
serde_arrow relies on Serde to interpret Rust objects.
Therefore, adding support for serde_arrow to custom types is as easy as
using Serde’s derive macros.
In the Rust ecosystem there are two competing implementations of the arrow
in-memory format, arrow and
arrow2. serde_arrow supports
both. The supported arrow implementations can be selected via
features.
serde_arrow relies on a schema to translate between Rust and Arrow as
their type systems do not directly match. The schema is expressed as a
collection of Arrow fields with additional metadata describing the arrays.
E.g., to convert Rust strings containing timestamps to Date64 arrays, the
schema should contain a Date64. serde_arrow supports to derive the
schema from the data itself via schema tracing, but does not require it. It
is always possible to specify the schema manually. See the schema
module and SchemaLike for further details.
§Overview
| Operation | arrow-* | arrow2-* |
|---|---|---|
| Rust to Arrow | to_record_batch, to_arrow | to_arrow2 |
| Arrow to Rust | from_record_batch, from_arrow | from_arrow2 |
| Array Builder | ArrayBuilder::from_arrow | ArrayBuilder::from_arrow2 |
| Serializer | ArrayBuilder::from_arrow + Serializer::new | ArrayBuilder::from_arrow2 + Serializer::new |
| Deserializer | Deserializer::from_record_batch, Deserializer::from_arrow | Deserializer::from_arrow2 |
See also:
- the quickstart guide for more examples of how to use this package
- the status summary for an overview over the supported Arrow and Rust constructs
§arrow Example
use arrow::datatypes::FieldRef;
use serde_arrow::schema::{SchemaLike, TracingOptions};
#[derive(Serialize, Deserialize)]
struct Record {
a: f32,
b: i32,
}
let records = vec![
Record { a: 1.0, b: 1 },
Record { a: 2.0, b: 2 },
Record { a: 3.0, b: 3 },
];
// Determine Arrow schema
let fields = Vec::<FieldRef>::from_type::<Record>(TracingOptions::default())?;
// Build the record batch
let batch = serde_arrow::to_record_batch(&fields, &records)?;The RecordBatch can then be written to disk, e.g., as parquet using
the ArrowWriter from the parquet crate.
§arrow2 Example
Requires one of arrow2 feature (see below).
use arrow2::datatypes::Field;
use serde_arrow::schema::{SchemaLike, TracingOptions};
#[derive(Serialize, Deserialize)]
struct Record {
a: f32,
b: i32,
}
let records = vec![
Record { a: 1.0, b: 1 },
Record { a: 2.0, b: 2 },
Record { a: 3.0, b: 3 },
];
let fields = Vec::<Field>::from_type::<Record>(TracingOptions::default())?;
let arrays = serde_arrow::to_arrow2(&fields, &records)?;The generated arrays can then be written to disk, e.g., as parquet:
use arrow2::{chunk::Chunk, datatypes::Schema};
// see https://jorgecarleitao.github.io/arrow2/io/parquet_write.html
write_chunk(
"example.pq",
Schema::from(fields),
Chunk::new(arrays),
)?;§Features:
The version of arrow or arrow2 used can be selected via features. Per
default no arrow implementation is used. In that case only the base features
of serde_arrow are available.
The arrow-* and arrow2-* feature groups are compatible with each other.
I.e., it is possible to use arrow and arrow2 together. Within each group
the highest version is selected, if multiple features are activated. E.g,
when selecting arrow2-0-16 and arrow2-0-17, arrow2=0.17 will be used.
Available features:
| Arrow Feature | Arrow Version |
|---|---|
arrow-53 | arrow=53 |
arrow-52 | arrow=52 |
arrow-51 | arrow=51 |
arrow-50 | arrow=50 |
arrow-49 | arrow=49 |
arrow-48 | arrow=48 |
arrow-47 | arrow=47 |
arrow-46 | arrow=46 |
arrow-45 | arrow=45 |
arrow-44 | arrow=44 |
arrow-43 | arrow=43 |
arrow-42 | arrow=42 |
arrow-41 | arrow=41 |
arrow-40 | arrow=40 |
arrow-39 | arrow=39 |
arrow-38 | arrow=38 |
arrow-37 | arrow=37 |
arrow2-0-17 | arrow2=0.17 |
arrow2-0-16 | arrow2=0.16 |
Modules§
- Internal. Do not use
- The mapping between Rust and Arrow types
- Helpers that may be useful when using
serde_arrow
Structs§
- Construct arrays by pushing individual records
- A structure to deserialize Arrow arrays into Rust objects
- Wrap an
ArrayBuilderwith as a Serializer
Enums§
- Common errors during
serde_arrow’s usage
Functions§
- Deserialize items from arrow arrays (requires one of the
arrow-*features) - Deserialize items from the given arrow2 arrays (requires one of the
arrow2-*features) - Deserialize items from a record batch (requires one of the
arrow-*features) - Build arrow arrays from the given items (requires one of the
arrow-*features) - Build arrow2 arrays from the given items (requires one of the
arrow2-*features) - Build a record batch from the given items (requires one of the
arrow-*features)
Type Aliases§
- A Result type that defaults to
serde_arrow’s Error type