Crate serde_arrow

source
Expand description

§serde_arrow - convert sequences Rust objects to / from arrow arrays

The arrow in-memory format is a powerful way to work with data frame like structures. However, the API of the underlying Rust crates can be at times cumbersome to use due to the statically typed nature of Rust. serde_arrow, offers a simple way to convert Rust objects into Arrow arrays and back. serde_arrow relies on Serde to interpret Rust objects. Therefore, adding support for serde_arrow to custom types is as easy as using Serde’s derive macros.

In the Rust ecosystem there are two competing implementations of the arrow in-memory format, arrow and arrow2. serde_arrow supports both. The supported arrow implementations can be selected via features.

serde_arrow relies on a schema to translate between Rust and Arrow as their type systems do not directly match. The schema is expressed as a collection of Arrow fields with additional metadata describing the arrays. E.g., to convert Rust strings containing timestamps to Date64 arrays, the schema should contain a Date64. serde_arrow supports to derive the schema from the data itself via schema tracing, but does not require it. It is always possible to specify the schema manually. See the schema module and SchemaLike for further details.

§Overview

See also:

§arrow Example

use arrow::datatypes::FieldRef;
use serde_arrow::schema::{SchemaLike, TracingOptions};

#[derive(Serialize, Deserialize)]
struct Record {
    a: f32,
    b: i32,
}

let records = vec![
    Record { a: 1.0, b: 1 },
    Record { a: 2.0, b: 2 },
    Record { a: 3.0, b: 3 },
];

// Determine Arrow schema
let fields = Vec::<FieldRef>::from_type::<Record>(TracingOptions::default())?;

// Build the record batch
let batch = serde_arrow::to_record_batch(&fields, &records)?;

The RecordBatch can then be written to disk, e.g., as parquet using the ArrowWriter from the parquet crate.

§arrow2 Example

Requires one of arrow2 feature (see below).

use arrow2::datatypes::Field;
use serde_arrow::schema::{SchemaLike, TracingOptions};

#[derive(Serialize, Deserialize)]
struct Record {
    a: f32,
    b: i32,
}

let records = vec![
    Record { a: 1.0, b: 1 },
    Record { a: 2.0, b: 2 },
    Record { a: 3.0, b: 3 },
];

let fields = Vec::<Field>::from_type::<Record>(TracingOptions::default())?;
let arrays = serde_arrow::to_arrow2(&fields, &records)?;

The generated arrays can then be written to disk, e.g., as parquet:

use arrow2::{chunk::Chunk, datatypes::Schema};

// see https://jorgecarleitao.github.io/arrow2/io/parquet_write.html
write_chunk(
    "example.pq",
    Schema::from(fields),
    Chunk::new(arrays),
)?;

§Features:

The version of arrow or arrow2 used can be selected via features. Per default no arrow implementation is used. In that case only the base features of serde_arrow are available.

The arrow-* and arrow2-* feature groups are compatible with each other. I.e., it is possible to use arrow and arrow2 together. Within each group the highest version is selected, if multiple features are activated. E.g, when selecting arrow2-0-16 and arrow2-0-17, arrow2=0.17 will be used.

Available features:

Arrow FeatureArrow Version
arrow-53arrow=53
arrow-52arrow=52
arrow-51arrow=51
arrow-50arrow=50
arrow-49arrow=49
arrow-48arrow=48
arrow-47arrow=47
arrow-46arrow=46
arrow-45arrow=45
arrow-44arrow=44
arrow-43arrow=43
arrow-42arrow=42
arrow-41arrow=41
arrow-40arrow=40
arrow-39arrow=39
arrow-38arrow=38
arrow-37arrow=37
arrow2-0-17arrow2=0.17
arrow2-0-16arrow2=0.16

Modules§

  • Internal. Do not use
  • The mapping between Rust and Arrow types
  • Helpers that may be useful when using serde_arrow

Structs§

Enums§

  • Common errors during serde_arrow’s usage

Functions§

  • Deserialize items from arrow arrays (requires one of the arrow-* features)
  • Deserialize items from the given arrow2 arrays (requires one of the arrow2-* features)
  • Deserialize items from a record batch (requires one of the arrow-* features)
  • Build arrow arrays from the given items (requires one of the arrow-* features)
  • Build arrow2 arrays from the given items (requires one of the arrow2-* features)
  • Build a record batch from the given items (requires one of the arrow-* features)

Type Aliases§

  • A Result type that defaults to serde_arrow’s Error type