Struct SuperTable

Source

pub struct SuperTable {
    pub batches: Vec<Arc<Table>>,
    pub schema: Vec<Arc<Field>>,
    pub n_rows: usize,
    pub name: String,
}

Expand description

§SuperTable

Higher-order container representing a sequence of Table batches with consistent schema.

§Overview

Each batch is a Table (record batch) with identical column metadata.
Stored as Vec<Arc<Table>>, preserving order and schema consistency.
Row counts per batch may vary, but are consistent across all Table columns.
When exported via Arrow FFI, the batches are viewed as a single logical table.
Useful for open-ended streams, partitioned datasets, or other scenarios where batches are processed independently.

§Fields

batches: ordered collection of Table batches.
schema: cached schema from the first batch for fast access.
n_rows: total row count across all batches.
name: super table name.

§Use cases

Streaming and mini-batch processing.
Reading multiple Arrow IPC/memory-mapped files as one dataset.
Parallel or windowed in-memory analytics.
Incremental table construction where batches arrive over time.

Fields§

§batches: Vec<Arc<Table>>§schema: Vec<Arc<Field>>§n_rows: usize§name: String

Implementations§

Source §

impl SuperTable

Source

pub fn new(name: String) -> SuperTable

Creates a new empty BatchedTable with a specified name.

Source

pub fn from_batches( batches: Vec<Arc<Table>>, name_override: Option<String>, ) -> SuperTable

Builds from a collection of Table batches.

Panics if column count or field metadata are inconsistent.

Source

pub fn push(&mut self, batch: Arc<Table>)

Append a new Table batch.

Panics on schema mismatch.

Source

pub fn insert_rows( &mut self, index: usize, other: impl Into<SuperTable>, ) -> Result<(), MinarrowError>

Inserts rows from another SuperTable (or Table) at the specified index.

This is an O(n) operation where n is the number of rows in the batch containing the insertion point.

§Arguments

index - Global row position before which to insert (0 = prepend, n_rows = append)
other - SuperTable or Table to insert (via Into<SuperTable>)

§Requirements

Schema (column names, types, nullability) must match
index must be <= self.n_rows

§Strategy

Finds the batch containing the insertion point, splits it at that position, then inserts other’s batches in between the split halves. This redistributes rows across batches while preserving chunked structure.

§Errors

IndexError if index > n_rows
Schema mismatch if field metadata doesn’t match

Source

pub fn n_cols(&self) -> usize

Source

pub fn cols(&self) -> Vec<Arc<Field>>

Returns the columns of the Super Table

Holds an assumption that all inner tables have the same fields

Source

pub fn rechunk( &mut self, strategy: RechunkStrategy, ) -> Result<(), MinarrowError>

Rechunks the table according to the specified strategy.

Redistributes rows across batches using an efficient incremental approach that avoids full materialization:

Count(n): Creates batches of n rows (last batch may be smaller)
Auto: Uses a default size of 8192 rows
Memory(bytes): Targets a specific memory size per batch

§Arguments

strategy - The rechunking strategy to use

§Errors

Returns IndexError if Count(0) is specified
Returns IndexError if memory-based calculation results in 0 chunk size

§Example

// Rechunk into 1024-row batches
table.rechunk(RechunkStrategy::Count(1024))?;

// Rechunk with default size
table.rechunk(RechunkStrategy::Auto)?;

// Target 64KB per batch
table.rechunk(RechunkStrategy::Memory(65536))?;

Source

pub fn rechunk_to( &mut self, up_to_row: usize, strategy: RechunkStrategy, ) -> Result<(), MinarrowError>

Rechunks only the first up_to_row rows, leaving the rest untouched.

This is useful for streaming scenarios where new data is being appended and you want to rechunk stable data while leaving recent additions alone.

§Arguments

up_to_row - Rechunk only rows before this index
strategy - The rechunking strategy to use

§Errors

Returns IndexError if up_to_row is greater than total row count
Returns same errors as rechunk() for invalid strategies

§Example

// Rechunk first 1000 rows, leave the rest untouched
table.rechunk_to(1000, RechunkStrategy::Count(512))?;

Trait Implementations§

Source §

impl AsRef<SuperTable> for PyTable

Source §

fn as_ref(&self) -> &SuperTable

Converts this type into a shared reference of the (usually inferred) input type.

Source §

impl Clone for SuperTable

Source §

fn clone(&self) -> SuperTable

Returns a duplicate of the value. Read more

1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Source §

impl Concatenate for SuperTable

Source §

fn concat(self, other: SuperTable) -> Result<SuperTable, MinarrowError>

Concatenates two SuperTables by appending all batches from other to self.

§Requirements

Both SuperTables must have the same schema (column names and types)

§Returns

A new SuperTable containing all batches from self followed by all batches from other

§Errors

IncompatibleTypeError if schemas don’t match

Source §

impl Consolidate for SuperTable

Source §

fn consolidate(self) -> Table

Consolidates all batches into a single contiguous Table.

Materialises all rows from all batches into one table. Use this when you need contiguous memory for operations or APIs that require single buffers.

Uses self.name for the resulting table. Rename afterwards if needed.

When the arena feature is enabled, all column buffers are written into a single allocation then sliced into typed views, reducing allocation count from O(columns) to O(1). The resulting buffers are SharedBuffer-backed; mutations trigger copy-on-write.

Without the arena feature, falls back to per-column concat.