Crate serde_arrow

Crate serde_arrow 

Source
Expand description

§serde_arrow - convert sequences Rust objects to / from arrow arrays

The arrow in-memory format is a powerful way to work with data frame like structures. However, the API of the underlying Rust crates can be at times cumbersome to use due to the statically typed nature of Rust. serde_arrow, offers a simple way to convert Rust objects into Arrow arrays and back. serde_arrow relies on Serde to interpret Rust objects. Therefore, adding support for serde_arrow to custom types is as easy as using Serde’s derive macros.

serde_arrow mainly targets the arrow crate, but also supports the deprecated arrow2 crate. The arrow implementations can be selected via features.

serde_arrow relies on a schema to translate between Rust and Arrow as their type systems do not directly match. The schema is expressed as a collection of Arrow fields with additional metadata describing the arrays. E.g., to convert a vector of Rust strings representing timestamps to an arrow Timestamp array, the schema should contain a field with data type Timestamp. serde_arrow supports to derive the schema from the data or the Rust types themselves via schema tracing, but does not require it. It is always possible to specify the schema manually. See the schema module and SchemaLike for further details.

§Overview

See also:

§Example

use arrow::datatypes::FieldRef;
use serde_arrow::schema::{SchemaLike, TracingOptions};

#[derive(Serialize, Deserialize)]
struct Record {
    a: f32,
    b: i32,
}

let records = vec![
    Record { a: 1.0, b: 1 },
    Record { a: 2.0, b: 2 },
    Record { a: 3.0, b: 3 },
];

// Determine Arrow schema
let fields = Vec::<FieldRef>::from_type::<Record>(TracingOptions::default())?;

// Build the record batch
let batch = serde_arrow::to_record_batch(&fields, &records)?;

The RecordBatch can then be written to disk, e.g., as parquet using the ArrowWriter from the parquet crate.

§Features:

The version of arrow or arrow2 used can be selected via features. Per default no arrow implementation is used. In that case only the base features of serde_arrow are available.

The arrow-* and arrow2-* feature groups are compatible with each other. I.e., it is possible to use arrow and arrow2 together. Within each group the highest version is selected, if multiple features are activated. E.g, when selecting arrow2-0-16 and arrow2-0-17, arrow2=0.17 will be used.

Note that because the highest version is selected, the features are not additive. In particular, it is not possible to use serde_arrow::to_arrow for multiple different arrow versions at the same time. See the next section for how to use serde_arrow in library code.

Available features:

Arrow FeatureArrow Version
arrow-56arrow=56
arrow-55arrow=55
arrow-54arrow=54
arrow-53arrow=53
arrow-52arrow=52
arrow-51arrow=51
arrow-50arrow=50
arrow-49arrow=49
arrow-48arrow=48
arrow-47arrow=47
arrow-46arrow=46
arrow-45arrow=45
arrow-44arrow=44
arrow-43arrow=43
arrow-42arrow=42
arrow-41arrow=41
arrow-40arrow=40
arrow-39arrow=39
arrow-38arrow=38
arrow-37arrow=37
arrow2-0-17arrow2=0.17
arrow2-0-16arrow2=0.16

§Usage in libraries

In libraries, it is not recommended to use the arrow and arrow2 functions directly. Rather it is recommended to rely on the marrow based functionality, as the features of marrow are designed to be strictly additive.

For example to build a record batch, first build the corresponding marrow types and then use them to build the record batch:

// Determine Arrow schema
let fields = Vec::<marrow::datatypes::Field>::from_type::<Record>(TracingOptions::default())?;

// Build the marrow arrays
let arrays = serde_arrow::to_marrow(&fields, &records)?;

// Build the record batch
let arrow_fields = fields.iter()
    .map(arrow::datatypes::Field::try_from)
    .collect::<Result<Vec<_>, _>>()?;

let arrow_arrays = arrays.into_iter()
    .map(arrow::array::ArrayRef::try_from)
    .collect::<Result<Vec<_>, _>>()?;

let record_batch = arrow::array::RecordBatch::try_new(
    Arc::new(arrow::datatypes::Schema::new(arrow_fields)),
    arrow_arrays,
);

Re-exports§

pub use marrow;

Modules§

_impl
Internal. Do not use
deserializer
Deserialization of items
schema
The mapping between Rust and Arrow types
utils
Helpers that may be useful when using serde_arrow

Structs§

ArrayBuilder
Construct arrays by pushing individual records
Deserializer
A structure to deserialize Arrow arrays into Rust objects
Serializer
Wrap an ArrayBuilder with as a Serializer

Enums§

Error
Common errors during serde_arrow’s usage

Functions§

from_arrow
Deserialize items from arrow arrays (requires one of the arrow-* features)
from_arrow2
Deserialize items from the given arrow2 arrays (requires one of the arrow2-* features)
from_marrow
Deserialize items from marrow views
from_record_batch
Deserialize items from a record batch (requires one of the arrow-* features)
to_arrow
Build arrow arrays from the given items (requires one of the arrow-* features)
to_arrow2
Build arrow2 arrays from the given items (requires one of the arrow2-* features)
to_marrow
Build marrow array from the given items
to_record_batch
Build a record batch from the given items (requires one of the arrow-* features)

Type Aliases§

Result
A Result type that defaults to serde_arrow’s Error type