Struct DefaultSchemaAdapterFactory

Source
pub struct DefaultSchemaAdapterFactory;
Expand description

Default SchemaAdapterFactory for mapping schemas.

This can be used to adapt file-level record batches to a table schema and implement schema evolution.

Given an input file schema and a table schema, this factory returns SchemaAdapter that return SchemaMappers that:

  1. Reorder columns
  2. Cast columns to the correct type
  3. Fill missing columns with nulls

§Errors:

  • If a column in the table schema is non-nullable but is not present in the file schema (i.e. it is missing), the returned mapper tries to fill it with nulls resulting in a schema error.

§Illustration of Schema Mapping

┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─                  ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
 ┌───────┐   ┌───────┐ │                  ┌───────┐   ┌───────┐   ┌───────┐ │
││  1.0  │   │ "foo" │                   ││ NULL  │   │ "foo" │   │ "1.0" │
 ├───────┤   ├───────┤ │ Schema mapping   ├───────┤   ├───────┤   ├───────┤ │
││  2.0  │   │ "bar" │                   ││  NULL │   │ "bar" │   │ "2.0" │
 └───────┘   └───────┘ │────────────────▶ └───────┘   └───────┘   └───────┘ │
│                                        │
 column "c"  column "b"│                  column "a"  column "b"  column "c"│
│ Float64       Utf8                     │  Int32        Utf8        Utf8
 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘                  ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
    Input Record Batch                         Output Record Batch

    Schema {                                   Schema {
     "c": Float64,                              "a": Int32,
     "b": Utf8,                                 "b": Utf8,
    }                                           "c": Utf8,
                                               }

§Example of using the DefaultSchemaAdapterFactory to map RecordBatchs

Note SchemaMapping also supports mapping partial batches, which is used as part of predicate pushdown.

// Table has fields "a",  "b" and "c"
let table_schema = Schema::new(vec![
    Field::new("a", DataType::Int32, true),
    Field::new("b", DataType::Utf8, true),
    Field::new("c", DataType::Utf8, true),
]);

// create an adapter to map the table schema to the file schema
let adapter = DefaultSchemaAdapterFactory::from_schema(Arc::new(table_schema));

// The file schema has fields "c" and "b" but "b" is stored as an 'Float64'
// instead of 'Utf8'
let file_schema = Schema::new(vec![
   Field::new("c", DataType::Utf8, true),
   Field::new("b", DataType::Float64, true),
]);

// Get a mapping from the file schema to the table schema
let (mapper, _indices) = adapter.map_schema(&file_schema).unwrap();

let file_batch = record_batch!(
    ("c", Utf8, vec!["foo", "bar"]),
    ("b", Float64, vec![1.0, 2.0])
).unwrap();

let mapped_batch = mapper.map_batch(file_batch).unwrap();

// the mapped batch has the correct schema and the "b" column has been cast to Utf8
let expected_batch = record_batch!(
   ("a", Int32, vec![None, None]),  // missing column filled with nulls
   ("b", Utf8, vec!["1.0", "2.0"]), // b was cast to string and order was changed
   ("c", Utf8, vec!["foo", "bar"])
).unwrap();
assert_eq!(mapped_batch, expected_batch);

Implementations§

Source§

impl DefaultSchemaAdapterFactory

Source

pub fn from_schema(table_schema: SchemaRef) -> Box<dyn SchemaAdapter>

Create a new factory for mapping batches from a file schema to a table schema.

This is a convenience for DefaultSchemaAdapterFactory::create with the same schema for both the projected table schema and the table schema.

Trait Implementations§

Source§

impl Clone for DefaultSchemaAdapterFactory

Source§

fn clone(&self) -> DefaultSchemaAdapterFactory

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for DefaultSchemaAdapterFactory

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for DefaultSchemaAdapterFactory

Source§

fn default() -> DefaultSchemaAdapterFactory

Returns the “default value” for a type. Read more
Source§

impl SchemaAdapterFactory for DefaultSchemaAdapterFactory

Source§

fn create( &self, projected_table_schema: SchemaRef, _table_schema: SchemaRef, ) -> Box<dyn SchemaAdapter>

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

Source§

impl<T> ErasedDestructor for T
where T: 'static,