Struct arrow::record_batch::RecordBatch

source ·

pub struct RecordBatch { /* private fields */ }

Expand description

A two-dimensional batch of column-oriented data with a defined schema.

A RecordBatch is a two-dimensional dataset of a number of contiguous arrays, each the same length. A record batch has a schema which must match its arrays’ datatypes.

Record batches are a convenient unit of work for various serialization and computation functions, possibly incremental.

Implementations§

source §

impl RecordBatch

source

pub fn try_new( schema: Arc<Schema>, columns: Vec<Arc<dyn Array>> ) -> Result<RecordBatch, ArrowError>

Creates a RecordBatch from a schema and columns.

Expects the following:

the vec of columns to not be empty
the schema and column data types to have equal lengths and match
each array in columns to have the same length

If the conditions are not met, an error is returned.

§Example


let id_array = Int32Array::from(vec![1, 2, 3, 4, 5]);
let schema = Schema::new(vec![
    Field::new("id", DataType::Int32, false)
]);

let batch = RecordBatch::try_new(
    Arc::new(schema),
    vec![Arc::new(id_array)]
).unwrap();

source

pub fn try_new_with_options( schema: Arc<Schema>, columns: Vec<Arc<dyn Array>>, options: &RecordBatchOptions ) -> Result<RecordBatch, ArrowError>

Creates a RecordBatch from a schema and columns, with additional options, such as whether to strictly validate field names.

See RecordBatch::try_new for the expected conditions.

source

pub fn new_empty(schema: Arc<Schema>) -> RecordBatch

Creates a new empty RecordBatch.

source

pub fn with_schema(self, schema: Arc<Schema>) -> Result<RecordBatch, ArrowError>

Override the schema of this RecordBatch

Returns an error if schema is not a superset of the current schema as determined by Schema::contains

source

pub fn schema(&self) -> Arc<Schema>

Returns the Schema of the record batch.

source

pub fn schema_ref(&self) -> &Arc<Schema>

Returns a reference to the Schema of the record batch.

source

pub fn project(&self, indices: &[usize]) -> Result<RecordBatch, ArrowError>

Projects the schema onto the specified columns

source

pub fn num_columns(&self) -> usize

Returns the number of columns in the record batch.

§Example


let id_array = Int32Array::from(vec![1, 2, 3, 4, 5]);
let schema = Schema::new(vec![
    Field::new("id", DataType::Int32, false)
]);

let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(id_array)]).unwrap();

assert_eq!(batch.num_columns(), 1);

source

pub fn num_rows(&self) -> usize

Returns the number of rows in each column.

§Example


let id_array = Int32Array::from(vec![1, 2, 3, 4, 5]);
let schema = Schema::new(vec![
    Field::new("id", DataType::Int32, false)
]);

let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(id_array)]).unwrap();

assert_eq!(batch.num_rows(), 5);

source

pub fn column(&self, index: usize) -> &Arc<dyn Array>

Get a reference to a column’s array by index.

§Panics

Panics if index is outside of 0..num_columns.

source

pub fn column_by_name(&self, name: &str) -> Option<&Arc<dyn Array>>

Get a reference to a column’s array by name.

source

pub fn columns(&self) -> &[Arc<dyn Array>]

Get a reference to all columns in the record batch.

source

pub fn remove_column(&mut self, index: usize) -> Arc<dyn Array>

Remove column by index and return it.

Return the ArrayRef if the column is removed.

§Panics

Panics if `index`` out of bounds.

§Example

use std::sync::Arc;
use arrow_array::{BooleanArray, Int32Array, RecordBatch};
use arrow_schema::{DataType, Field, Schema};
let id_array = Int32Array::from(vec![1, 2, 3, 4, 5]);
let bool_array = BooleanArray::from(vec![true, false, false, true, true]);
let schema = Schema::new(vec![
    Field::new("id", DataType::Int32, false),
    Field::new("bool", DataType::Boolean, false),
]);

let mut batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(id_array), Arc::new(bool_array)]).unwrap();

let removed_column = batch.remove_column(0);
assert_eq!(removed_column.as_any().downcast_ref::<Int32Array>().unwrap(), &Int32Array::from(vec![1, 2, 3, 4, 5]));
assert_eq!(batch.num_columns(), 1);

source

pub fn slice(&self, offset: usize, length: usize) -> RecordBatch

Return a new RecordBatch where each column is sliced according to offset and length

§Panics

Panics if offset with length is greater than column length.

source

pub fn try_from_iter<I, F>(value: I) -> Result<RecordBatch, ArrowError>
where I: IntoIterator<Item = (F, Arc<dyn Array>)>, F: AsRef<str>,

Create a RecordBatch from an iterable list of pairs of the form (field_name, array), with the same requirements on fields and arrays as RecordBatch::try_new. This method is often used to create a single RecordBatch from arrays, e.g. for testing.

The resulting schema is marked as nullable for each column if the array for that column is has any nulls. To explicitly specify nullibility, use RecordBatch::try_from_iter_with_nullable

Example:


let a: ArrayRef = Arc::new(Int32Array::from(vec![1, 2]));
let b: ArrayRef = Arc::new(StringArray::from(vec!["a", "b"]));

let record_batch = RecordBatch::try_from_iter(vec![
  ("a", a),
  ("b", b),
]);

source

pub fn try_from_iter_with_nullable<I, F>( value: I ) -> Result<RecordBatch, ArrowError>
where I: IntoIterator<Item = (F, Arc<dyn Array>, bool)>, F: AsRef<str>,

Create a RecordBatch from an iterable list of tuples of the form (field_name, array, nullable), with the same requirements on fields and arrays as RecordBatch::try_new. This method is often used to create a single RecordBatch from arrays, e.g. for testing.

Example:


let a: ArrayRef = Arc::new(Int32Array::from(vec![1, 2]));
let b: ArrayRef = Arc::new(StringArray::from(vec![Some("a"), Some("b")]));

// Note neither `a` nor `b` has any actual nulls, but we mark
// b an nullable
let record_batch = RecordBatch::try_from_iter_with_nullable(vec![
  ("a", a, false),
  ("b", b, true),
]);