SchemaLike

Trait SchemaLike 

Source
pub trait SchemaLike: Sized + Sealed {
    // Required methods
    fn from_value<T: Serialize>(value: T) -> Result<Self>;
    fn from_type<'de, T: Deserialize<'de>>(
        options: TracingOptions,
    ) -> Result<Self>;
    fn from_samples<T: Serialize>(
        samples: T,
        options: TracingOptions,
    ) -> Result<Self>;
}
Expand description

A sealed trait to add support for constructing schema-like objects

There are three main ways to specify the schema:

  1. SchemaLike::from_value: specify the schema manually, e.g., as a JSON value
  2. SchemaLike::from_type: determine the schema from the record type
  3. SchemaLike::from_samples: Determine the schema from samples of data

The following types implement SchemaLike and can be constructed with the methods mentioned above:

Instances of SerdeArrowSchema can be directly serialized and deserialized. The format is that described in SchemaLike::from_value.

use serde_arrow::schema::SerdeArrowSchema;

let schema: SerdeArrowSchema = serde_json::from_str(json_schema_str)?;
serde_json::to_string(&schema)?;

Required Methods§

Source

fn from_value<T: Serialize>(value: T) -> Result<Self>

Build the schema from an object that implements serialize (e.g., serde_json::Value)

use arrow::datatypes::FieldRef;
use serde_arrow::schema::SchemaLike;

let schema = serde_json::json!([
    {"name": "foo", "data_type": "U8"},
    {"name": "bar", "data_type": "Utf8"},
]);

let fields = Vec::<FieldRef>::from_value(&schema)?;

The schema can be given in two ways:

  • an array of fields
  • or an object with a "fields" key that contains an array of fields

Each field is an object with the following keys:

  • "name" (required): the name of the field
  • "data_type" (required): the data type of the field as a string
  • "nullable" (optional): if true, the field can contain null values
  • "strategy" (optional): if given a string describing the strategy to use
  • "children" (optional): a list of child fields, the semantics depend on the data type

The following data types are supported:

  • booleans: "Bool"
  • signed integers: "I8", "I16", "I32", "I64"
  • unsigned integers: "U8", "U16", "U32", "U64"
  • floats: "F16", "F32", "F64"
  • strings: "Utf8", "LargeUtf8"
  • decimals: "Decimal128(precision, scale)", as in "Decimal128(5, 2)"
  • date objects: "Date32", "Date64"
  • date time objects: "Timestamp(unit, optional_timezone)" with unit being one of Second, Millisecond, Microsecond, Nanosecond and optional_timezone being either None or Some("Utc").
  • time objects: "Time32(unit)", "Time64(unit)" with unit being one of Second, Millisecond, Microsecond, Nanosecond.
  • durations: "Duration(unit)" with unit being one of Second, Millisecond, Microsecond, Nanosecond.
  • lists: "List", "LargeList". "children" must contain a single field named "element" that describes the element type
  • structs: "Struct". "children" must contain the child fields
  • maps: "Map". "children" must contain two fields, named "key" and "value" that encode the key and value types
  • unions: "Union". "children" must contain the different variants
  • dictionaries: "Dictionary". "children" must contain two different fields, named "key" of integer type and named "value" of string type
Source

fn from_type<'de, T: Deserialize<'de>>(options: TracingOptions) -> Result<Self>

Determine the schema from the given record type. See TracingOptions for customization options.

This approach requires the type T to implement Deserialize. As only type information is used, it is not possible to detect data dependent properties. Examples of unsupported features:

  • auto detection of date time strings
  • non self-describing types such as serde_json::Value
  • flattened structures (#[serde(flatten)])
  • types that require specific data to be deserialized, such as the DateTime type of chrono or the Uuid type of the uuid package

Consider using from_samples in these cases.

use arrow::datatypes::{DataType, FieldRef};
use serde::Deserialize;
use serde_arrow::schema::{SchemaLike, TracingOptions};

#[derive(Deserialize)]
struct Record {
    int: i32,
    float: f64,
    string: String,
}

let fields = Vec::<FieldRef>::from_type::<Record>(TracingOptions::default())?;

assert_eq!(fields[0].data_type(), &DataType::Int32);
assert_eq!(fields[1].data_type(), &DataType::Float64);
assert_eq!(fields[2].data_type(), &DataType::LargeUtf8);

Note, the type T must encode a single “row” in the resulting data frame. When encoding single values, consider using the Item wrapper.

use arrow::datatypes::{DataType, FieldRef};
use serde_arrow::{schema::{SchemaLike, TracingOptions}, utils::Item};

let fields = Vec::<FieldRef>::from_type::<Item<f32>>(TracingOptions::default())?;

assert_eq!(fields[0].data_type(), &DataType::Float32);
Source

fn from_samples<T: Serialize>( samples: T, options: TracingOptions, ) -> Result<Self>

Determine the schema from samples. See TracingOptions for customization options.

This approach requires the type T to implement Serialize and the samples to include all relevant values. It uses only the information encoded in the samples to generate the schema. Therefore, the following requirements must be met:

  • at least one Some value for Option<..> fields
  • all variants of enum fields
  • at least one element for sequence fields (e.g., Vec<..>)
  • at least one example for map types (e.g., HashMap<.., ..>). All possible keys must be given, if options.map_as_struct == true.
use arrow::datatypes::{DataType, FieldRef};
use serde::Serialize;
use serde_arrow::schema::{SchemaLike, TracingOptions};

#[derive(Serialize)]
struct Record {
    int: i32,
    float: f64,
    string: String,
}

let samples = vec![
    Record {
        int: 1,
        float: 2.0,
        string: String::from("hello")
    },
    Record {
        int: -1,
        float: 32.0,
        string: String::from("world")
    },
    // ...
];

let fields = Vec::<FieldRef>::from_samples(&samples, TracingOptions::default())?;

assert_eq!(fields[0].data_type(), &DataType::Int32);
assert_eq!(fields[1].data_type(), &DataType::Float64);
assert_eq!(fields[2].data_type(), &DataType::LargeUtf8);

Note, the samples must encode “rows” in the resulting data frame. When encoding single values, consider using the Items wrapper.

use arrow::datatypes::{DataType, FieldRef};
use serde_arrow::{schema::{SchemaLike, TracingOptions}, utils::Items};

let fields = Vec::<FieldRef>::from_samples(
    &Items(&[1.0_f32, 2.0_f32, 3.0_f32]),
    TracingOptions::default(),
)?;

assert_eq!(fields[0].data_type(), &DataType::Float32);

Dyn Compatibility§

This trait is not dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety", so this trait is not object safe.

Implementations on Foreign Types§

Source§

impl SchemaLike for Vec<Field>

Schema support for Vec<arrow2::datatype::Field> (requires one of the arrow2-* features)

Source§

fn from_value<T: Serialize>(value: T) -> Result<Self>

Source§

fn from_type<'de, T: Deserialize<'de>>(options: TracingOptions) -> Result<Self>

Source§

fn from_samples<T: Serialize>( samples: T, options: TracingOptions, ) -> Result<Self>

Source§

impl SchemaLike for Vec<Field>

Schema support for Vec<arrow::datatype::Field> (requires one of the arrow-* features)

Source§

fn from_value<T: Serialize>(value: T) -> Result<Self>

Source§

fn from_type<'de, T: Deserialize<'de>>(options: TracingOptions) -> Result<Self>

Source§

fn from_samples<T: Serialize>( samples: T, options: TracingOptions, ) -> Result<Self>

Source§

impl SchemaLike for Vec<Field>

Source§

fn from_value<T: Serialize>(value: T) -> Result<Self>

Source§

fn from_samples<T: Serialize>( samples: T, options: TracingOptions, ) -> Result<Self>

Source§

fn from_type<'de, T: Deserialize<'de>>(options: TracingOptions) -> Result<Self>

Source§

impl SchemaLike for Vec<FieldRef>

Schema support for Vec<arrow::datatype::FieldRef> (requires one of the arrow-* features)

Source§

fn from_value<T: Serialize>(value: T) -> Result<Self>

Source§

fn from_type<'de, T: Deserialize<'de>>(options: TracingOptions) -> Result<Self>

Source§

fn from_samples<T: Serialize>( samples: T, options: TracingOptions, ) -> Result<Self>

Implementors§