TableSchema

Struct TableSchema 

Source
pub struct TableSchema { /* private fields */ }
Expand description

Helper to hold table schema information for partitioned data sources.

When reading partitioned data (such as Hive-style partitioning), a table’s schema consists of two parts:

  1. File schema: The schema of the actual data files on disk
  2. Partition columns: Columns that are encoded in the directory structure, not stored in the files themselves

§Example: Partitioned Table

Consider a table with the following directory structure:

/data/date=2025-10-10/region=us-west/data.parquet
/data/date=2025-10-11/region=us-east/data.parquet

In this case:

  • File schema: The schema of data.parquet files (e.g., [user_id, amount])
  • Partition columns: [date, region] extracted from the directory path
  • Table schema: The full schema combining both (e.g., [user_id, amount, date, region])

§When to Use

Use TableSchema when:

  • Reading partitioned data sources (Parquet, CSV, etc. with Hive-style partitioning)
  • You need to efficiently access different schema representations without reconstructing them
  • You want to avoid repeatedly concatenating file and partition schemas

For non-partitioned data or when working with a single schema representation, working directly with Arrow’s Schema or SchemaRef is simpler.

§Performance

This struct pre-computes and caches the full table schema, allowing cheap references to any representation without repeated allocations or reconstructions.

Implementations§

Source§

impl TableSchema

Source

pub fn new(file_schema: SchemaRef, table_partition_cols: Vec<FieldRef>) -> Self

Create a new TableSchema from a file schema and partition columns.

The table schema is automatically computed by appending the partition columns to the file schema.

You should prefer calling this method over chaining TableSchema::from_file_schema and TableSchema::with_table_partition_cols if you have both the file schema and partition columns available at construction time since it avoids re-computing the table schema.

§Arguments
  • file_schema - Schema of the data files (without partition columns)
  • table_partition_cols - Partition columns to append to each row
§Example
let file_schema = Arc::new(Schema::new(vec![
    Field::new("user_id", DataType::Int64, false),
    Field::new("amount", DataType::Float64, false),
]));

let partition_cols = vec![
    Arc::new(Field::new("date", DataType::Utf8, false)),
    Arc::new(Field::new("region", DataType::Utf8, false)),
];

let table_schema = TableSchema::new(file_schema, partition_cols);

// Table schema will have 4 columns: user_id, amount, date, region
assert_eq!(table_schema.table_schema().fields().len(), 4);
Source

pub fn from_file_schema(file_schema: SchemaRef) -> Self

Create a new TableSchema with no partition columns.

You should prefer calling TableSchema::new if you have partition columns at construction time since it avoids re-computing the table schema.

Source

pub fn with_table_partition_cols(self, partition_cols: Vec<FieldRef>) -> Self

Add partition columns to an existing TableSchema, returning a new instance.

You should prefer calling TableSchema::new instead of chaining TableSchema::from_file_schema into TableSchema::with_table_partition_cols if you have partition columns at construction time since it avoids re-computing the table schema.

Source

pub fn file_schema(&self) -> &SchemaRef

Get the file schema (without partition columns).

This is the schema of the actual data files on disk.

Source

pub fn table_partition_cols(&self) -> &Vec<FieldRef>

Get the table partition columns.

These are the columns derived from the directory structure that will be appended to each row during query execution.

Source

pub fn table_schema(&self) -> &SchemaRef

Get the full table schema (file schema + partition columns).

This is the complete schema that will be seen by queries, combining both the columns from the files and the partition columns.

Trait Implementations§

Source§

impl Clone for TableSchema

Source§

fn clone(&self) -> TableSchema

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for TableSchema

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

Source§

impl<T> ErasedDestructor for T
where T: 'static,