pub trait SchemaLike: Sized + Sealed {
// Required methods
fn from_value<T: Serialize>(value: T) -> Result<Self>;
fn from_type<'de, T: Deserialize<'de>>(
options: TracingOptions,
) -> Result<Self>;
fn from_samples<T: Serialize>(
samples: T,
options: TracingOptions,
) -> Result<Self>;
}Expand description
A sealed trait to add support for constructing schema-like objects
There are three main ways to specify the schema:
SchemaLike::from_value: specify the schema manually, e.g., as a JSON valueSchemaLike::from_type: determine the schema from the record typeSchemaLike::from_samples: Determine the schema from samples of data
The following types implement SchemaLike and can be constructed with the methods mentioned
above:
SerdeArrowSchemaVec<marrow::datatypes::Field>Vec<arrow::datatypes::FieldRef>Vec<arrow::datatypes::Field>Vec<arrow2::datatypes::Field>
Instances of SerdeArrowSchema can be directly serialized and deserialized. The format is that
described in SchemaLike::from_value.
use serde_arrow::schema::SerdeArrowSchema;
let schema: SerdeArrowSchema = serde_json::from_str(json_schema_str)?;
serde_json::to_string(&schema)?;Required Methods§
Sourcefn from_value<T: Serialize>(value: T) -> Result<Self>
fn from_value<T: Serialize>(value: T) -> Result<Self>
Build the schema from an object that implements serialize (e.g., serde_json::Value)
use arrow::datatypes::FieldRef;
use serde_arrow::schema::SchemaLike;
let schema = serde_json::json!([
{"name": "foo", "data_type": "U8"},
{"name": "bar", "data_type": "Utf8"},
]);
let fields = Vec::<FieldRef>::from_value(&schema)?;The schema can be given in two ways:
- an array of fields
- or an object with a
"fields"key that contains an array of fields
Each field is an object with the following keys:
"name"(required): the name of the field"data_type"(required): the data type of the field as a string"nullable"(optional): iftrue, the field can contain null values"strategy"(optional): if given a string describing the strategy to use"children"(optional): a list of child fields, the semantics depend on the data type
The following data types are supported:
- booleans:
"Bool" - signed integers:
"I8","I16","I32","I64" - unsigned integers:
"U8","U16","U32","U64" - floats:
"F16","F32","F64" - strings:
"Utf8","LargeUtf8" - decimals:
"Decimal128(precision, scale)", as in"Decimal128(5, 2)" - date objects:
"Date32","Date64" - date time objects:
"Timestamp(unit, optional_timezone)"withunitbeing one ofSecond,Millisecond,Microsecond,Nanosecondandoptional_timezonebeing eitherNoneorSome("Utc"). - time objects:
"Time32(unit)","Time64(unit)"with unit being one ofSecond,Millisecond,Microsecond,Nanosecond. - durations:
"Duration(unit)"with unit being one ofSecond,Millisecond,Microsecond,Nanosecond. - lists:
"List","LargeList"."children"must contain a single field named"element"that describes the element type - structs:
"Struct"."children"must contain the child fields - maps:
"Map"."children"must contain two fields, named"key"and"value"that encode the key and value types - unions:
"Union"."children"must contain the different variants - dictionaries:
"Dictionary"."children"must contain two different fields, named"key"of integer type and named"value"of string type
Sourcefn from_type<'de, T: Deserialize<'de>>(options: TracingOptions) -> Result<Self>
fn from_type<'de, T: Deserialize<'de>>(options: TracingOptions) -> Result<Self>
Determine the schema from the given record type. See TracingOptions for customization
options.
This approach requires the type T to implement Deserialize. As
only type information is used, it is not possible to detect data dependent properties.
Examples of unsupported features:
- auto detection of date time strings
- non self-describing types such as
serde_json::Value - flattened structures (
#[serde(flatten)]) - types that require specific data to be deserialized, such as the
DateTimetype ofchronoor theUuidtype of theuuidpackage
Consider using from_samples in these cases.
use arrow::datatypes::{DataType, FieldRef};
use serde::Deserialize;
use serde_arrow::schema::{SchemaLike, TracingOptions};
#[derive(Deserialize)]
struct Record {
int: i32,
float: f64,
string: String,
}
let fields = Vec::<FieldRef>::from_type::<Record>(TracingOptions::default())?;
assert_eq!(fields[0].data_type(), &DataType::Int32);
assert_eq!(fields[1].data_type(), &DataType::Float64);
assert_eq!(fields[2].data_type(), &DataType::LargeUtf8);Note, the type T must encode a single “row” in the resulting data frame. When encoding
single values, consider using the Item wrapper.
use arrow::datatypes::{DataType, FieldRef};
use serde_arrow::{schema::{SchemaLike, TracingOptions}, utils::Item};
let fields = Vec::<FieldRef>::from_type::<Item<f32>>(TracingOptions::default())?;
assert_eq!(fields[0].data_type(), &DataType::Float32);Sourcefn from_samples<T: Serialize>(
samples: T,
options: TracingOptions,
) -> Result<Self>
fn from_samples<T: Serialize>( samples: T, options: TracingOptions, ) -> Result<Self>
Determine the schema from samples. See TracingOptions for customization options.
This approach requires the type T to implement Serialize and the
samples to include all relevant values. It uses only the information encoded in the samples
to generate the schema. Therefore, the following requirements must be met:
- at least one
Somevalue forOption<..>fields - all variants of enum fields
- at least one element for sequence fields (e.g.,
Vec<..>) - at least one example for map types (e.g.,
HashMap<.., ..>). All possible keys must be given, ifoptions.map_as_struct == true.
use arrow::datatypes::{DataType, FieldRef};
use serde::Serialize;
use serde_arrow::schema::{SchemaLike, TracingOptions};
#[derive(Serialize)]
struct Record {
int: i32,
float: f64,
string: String,
}
let samples = vec![
Record {
int: 1,
float: 2.0,
string: String::from("hello")
},
Record {
int: -1,
float: 32.0,
string: String::from("world")
},
// ...
];
let fields = Vec::<FieldRef>::from_samples(&samples, TracingOptions::default())?;
assert_eq!(fields[0].data_type(), &DataType::Int32);
assert_eq!(fields[1].data_type(), &DataType::Float64);
assert_eq!(fields[2].data_type(), &DataType::LargeUtf8);Note, the samples must encode “rows” in the resulting data frame. When
encoding single values, consider using the
Items wrapper.
use arrow::datatypes::{DataType, FieldRef};
use serde_arrow::{schema::{SchemaLike, TracingOptions}, utils::Items};
let fields = Vec::<FieldRef>::from_samples(
&Items(&[1.0_f32, 2.0_f32, 3.0_f32]),
TracingOptions::default(),
)?;
assert_eq!(fields[0].data_type(), &DataType::Float32);Dyn Compatibility§
This trait is not dyn compatible.
In older versions of Rust, dyn compatibility was called "object safety", so this trait is not object safe.
Implementations on Foreign Types§
Source§impl SchemaLike for Vec<Field>
Schema support for Vec<arrow2::datatype::Field> (requires one of the
arrow2-* features)
impl SchemaLike for Vec<Field>
Schema support for Vec<arrow2::datatype::Field> (requires one of the
arrow2-* features)
fn from_value<T: Serialize>(value: T) -> Result<Self>
fn from_type<'de, T: Deserialize<'de>>(options: TracingOptions) -> Result<Self>
fn from_samples<T: Serialize>( samples: T, options: TracingOptions, ) -> Result<Self>
Source§impl SchemaLike for Vec<Field>
Schema support for Vec<arrow::datatype::Field> (requires one of the
arrow-* features)
impl SchemaLike for Vec<Field>
Schema support for Vec<arrow::datatype::Field> (requires one of the
arrow-* features)
fn from_value<T: Serialize>(value: T) -> Result<Self>
fn from_type<'de, T: Deserialize<'de>>(options: TracingOptions) -> Result<Self>
fn from_samples<T: Serialize>( samples: T, options: TracingOptions, ) -> Result<Self>
Source§impl SchemaLike for Vec<Field>
impl SchemaLike for Vec<Field>
fn from_value<T: Serialize>(value: T) -> Result<Self>
fn from_samples<T: Serialize>( samples: T, options: TracingOptions, ) -> Result<Self>
fn from_type<'de, T: Deserialize<'de>>(options: TracingOptions) -> Result<Self>
Source§impl SchemaLike for Vec<FieldRef>
Schema support for Vec<arrow::datatype::FieldRef> (requires one of the
arrow-* features)
impl SchemaLike for Vec<FieldRef>
Schema support for Vec<arrow::datatype::FieldRef> (requires one of the
arrow-* features)