Function split_record_batch

Source
pub fn split_record_batch(batch: RecordBatch, max: usize) -> Vec<RecordBatch>
Expand description

Splits a RecordBatch into multiple RecordBatches, each containing at most max rows.

§Arguments

  • batch - A reference to the input RecordBatch to split.
  • max - The maximum number of rows per output RecordBatch. Must be non-zero to avoid an empty result.

§Returns

A Result containing:

  • Vec<RecordBatch> - A vector of RecordBatches, each with at most max_rows rows.

§Edge Cases

  • If max is 0, returns an empty Vec.
  • If the input batch has 0 rows, returns an the original RecordBatch.
  • If the number of rows is not evenly divisible by max, the last RecordBatch will contain the remaining rows.

§Performance Notes

  • Zero-copy: Uses RecordBatch::slice for zero-copy access to the underlying data buffers, avoiding deep copies of row data.
  • Single allocation: Allocates a single Vec with pre-computed capacity to store the output RecordBatches, avoiding reallocations.

§Example

use arrow::record_batch::RecordBatch;
use arrow::error::ArrowError;

let max_rows = 3;
let chunks = split_record_batch_by_rows(batch, max_rows)?;
for (i, chunk) in chunks.iter().enumerate() {
    println!("Chunk {}: {} rows", i, chunk.num_rows());
}