pub fn split_record_batch(batch: RecordBatch, max: usize) -> Vec<RecordBatch>Expand description
Splits a RecordBatch into multiple RecordBatches, each containing at most max rows.
§Arguments
batch- A reference to the inputRecordBatchto split.max- The maximum number of rows per outputRecordBatch. Must be non-zero to avoid an empty result.
§Returns
A Result containing:
Vec<RecordBatch>- A vector ofRecordBatches, each with at mostmax_rowsrows.
§Edge Cases
- If
maxis 0, returns an emptyVec. - If the input
batchhas 0 rows, returns an the originalRecordBatch. - If the number of rows is not evenly divisible by
max, the lastRecordBatchwill contain the remaining rows.
§Performance Notes
- Zero-copy: Uses
RecordBatch::slicefor zero-copy access to the underlying data buffers, avoiding deep copies of row data. - Single allocation: Allocates a single
Vecwith pre-computed capacity to store the outputRecordBatches, avoiding reallocations.
§Example
ⓘ
use arrow::record_batch::RecordBatch;
use arrow::error::ArrowError;
let max_rows = 3;
let chunks = split_record_batch_by_rows(batch, max_rows)?;
for (i, chunk) in chunks.iter().enumerate() {
println!("Chunk {}: {} rows", i, chunk.num_rows());
}