Expand description
§serde_arrow
- convert sequences Rust objects to / from arrow arrays
The arrow in-memory format is a powerful way to work with data frame like structures. However,
the API of the underlying Rust crates can be at times cumbersome to use due to the statically
typed nature of Rust. serde_arrow
, offers a simple way to convert Rust objects into Arrow
arrays and back. serde_arrow
relies on Serde to interpret Rust objects.
Therefore, adding support for serde_arrow
to custom types is as easy as using Serde’s derive
macros.
serde_arrow
mainly targets the arrow
crate, but also
supports the deprecated arrow2
crate. The arrow
implementations can be selected via features.
serde_arrow
relies on a schema to translate between Rust and Arrow as their type systems do
not directly match. The schema is expressed as a collection of Arrow fields with additional
metadata describing the arrays. E.g., to convert a vector of Rust strings representing
timestamps to an arrow Timestamp
array, the schema should contain a field with data type
Timestamp
. serde_arrow
supports to derive the schema from the data or the Rust types
themselves via schema tracing, but does not require it. It is always possible to specify the
schema manually. See the schema
module and SchemaLike
for
further details.
§Overview
See also:
- the quickstart guide for more examples of how to use this package
- the status summary for an overview over the supported Arrow and Rust constructs
§Example
use arrow::datatypes::FieldRef;
use serde_arrow::schema::{SchemaLike, TracingOptions};
#[derive(Serialize, Deserialize)]
struct Record {
a: f32,
b: i32,
}
let records = vec![
Record { a: 1.0, b: 1 },
Record { a: 2.0, b: 2 },
Record { a: 3.0, b: 3 },
];
// Determine Arrow schema
let fields = Vec::<FieldRef>::from_type::<Record>(TracingOptions::default())?;
// Build the record batch
let batch = serde_arrow::to_record_batch(&fields, &records)?;
The RecordBatch
can then be written to disk, e.g., as parquet using the ArrowWriter
from
the parquet
crate.
§Features:
The version of arrow
or arrow2
used can be selected via features. Per default no arrow
implementation is used. In that case only the base features of serde_arrow
are available.
The arrow-*
and arrow2-*
feature groups are compatible with each other. I.e., it is possible
to use arrow
and arrow2
together. Within each group the highest version is selected, if
multiple features are activated. E.g, when selecting arrow2-0-16
and arrow2-0-17
,
arrow2=0.17
will be used.
Note that because the highest version is selected, the features are not additive. In particular,
it is not possible to use serde_arrow::to_arrow
for multiple different arrow
versions at the
same time. See the next section for how to use serde_arrow
in library code.
Available features:
Arrow Feature | Arrow Version |
---|---|
arrow-56 | arrow=56 |
arrow-55 | arrow=55 |
arrow-54 | arrow=54 |
arrow-53 | arrow=53 |
arrow-52 | arrow=52 |
arrow-51 | arrow=51 |
arrow-50 | arrow=50 |
arrow-49 | arrow=49 |
arrow-48 | arrow=48 |
arrow-47 | arrow=47 |
arrow-46 | arrow=46 |
arrow-45 | arrow=45 |
arrow-44 | arrow=44 |
arrow-43 | arrow=43 |
arrow-42 | arrow=42 |
arrow-41 | arrow=41 |
arrow-40 | arrow=40 |
arrow-39 | arrow=39 |
arrow-38 | arrow=38 |
arrow-37 | arrow=37 |
arrow2-0-17 | arrow2=0.17 |
arrow2-0-16 | arrow2=0.16 |
§Usage in libraries
In libraries, it is not recommended to use the arrow
and arrow2
functions directly. Rather
it is recommended to rely on the marrow
based functionality, as the features of marrow
are designed to be strictly additive.
For example to build a record batch, first build the corresponding marrow types and then use them to build the record batch:
// Determine Arrow schema
let fields = Vec::<marrow::datatypes::Field>::from_type::<Record>(TracingOptions::default())?;
// Build the marrow arrays
let arrays = serde_arrow::to_marrow(&fields, &records)?;
// Build the record batch
let arrow_fields = fields.iter()
.map(arrow::datatypes::Field::try_from)
.collect::<Result<Vec<_>, _>>()?;
let arrow_arrays = arrays.into_iter()
.map(arrow::array::ArrayRef::try_from)
.collect::<Result<Vec<_>, _>>()?;
let record_batch = arrow::array::RecordBatch::try_new(
Arc::new(arrow::datatypes::Schema::new(arrow_fields)),
arrow_arrays,
);
Re-exports§
pub use marrow;
Modules§
- _impl
- Internal. Do not use
- deserializer
- Deserialization of items
- schema
- The mapping between Rust and Arrow types
- utils
- Helpers that may be useful when using
serde_arrow
Structs§
- Array
Builder - Construct arrays by pushing individual records
- Deserializer
- A structure to deserialize Arrow arrays into Rust objects
- Serializer
- Wrap an
ArrayBuilder
with as a Serializer
Enums§
- Error
- Common errors during
serde_arrow
’s usage
Functions§
- from_
arrow - Deserialize items from arrow arrays (requires one of the
arrow-*
features) - from_
arrow2 - Deserialize items from the given arrow2 arrays (requires one of the
arrow2-*
features) - from_
marrow - Deserialize items from marrow views
- from_
record_ batch - Deserialize items from a record batch (requires one of the
arrow-*
features) - to_
arrow - Build arrow arrays from the given items (requires one of the
arrow-*
features) - to_
arrow2 - Build arrow2 arrays from the given items (requires one of the
arrow2-*
features) - to_
marrow - Build marrow array from the given items
- to_
record_ batch - Build a record batch from the given items (requires one of the
arrow-*
features)