Struct arrow::record_batch::RecordBatch
source · [−]pub struct RecordBatch { /* private fields */ }
Expand description
A two-dimensional batch of column-oriented data with a defined schema.
A RecordBatch
is a two-dimensional dataset of a number of
contiguous arrays, each the same length.
A record batch has a schema which must match its arrays’
datatypes.
Record batches are a convenient unit of work for various serialization and computation functions, possibly incremental. See also CSV reader and JSON reader.
Implementations
Creates a RecordBatch
from a schema and columns.
Expects the following:
- the vec of columns to not be empty
- the schema and column data types to have equal lengths and match
- each array in columns to have the same length
If the conditions are not met, an error is returned.
Example
use std::sync::Arc;
use arrow::array::Int32Array;
use arrow::datatypes::{Schema, Field, DataType};
use arrow::record_batch::RecordBatch;
let id_array = Int32Array::from(vec![1, 2, 3, 4, 5]);
let schema = Schema::new(vec![
Field::new("id", DataType::Int32, false)
]);
let batch = RecordBatch::try_new(
Arc::new(schema),
vec![Arc::new(id_array)]
)?;
pub fn try_new_with_options(
schema: SchemaRef,
columns: Vec<ArrayRef>,
options: &RecordBatchOptions
) -> Result<Self>
pub fn try_new_with_options(
schema: SchemaRef,
columns: Vec<ArrayRef>,
options: &RecordBatchOptions
) -> Result<Self>
Creates a RecordBatch
from a schema and columns, with additional options,
such as whether to strictly validate field names.
See RecordBatch::try_new
for the expected conditions.
Creates a new empty RecordBatch
.
Projects the schema onto the specified columns
Returns the number of columns in the record batch.
Example
use std::sync::Arc;
use arrow::array::Int32Array;
use arrow::datatypes::{Schema, Field, DataType};
use arrow::record_batch::RecordBatch;
let id_array = Int32Array::from(vec![1, 2, 3, 4, 5]);
let schema = Schema::new(vec![
Field::new("id", DataType::Int32, false)
]);
let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(id_array)])?;
assert_eq!(batch.num_columns(), 1);
Returns the number of rows in each column.
Panics
Panics if the RecordBatch
contains no columns.
Example
use std::sync::Arc;
use arrow::array::Int32Array;
use arrow::datatypes::{Schema, Field, DataType};
use arrow::record_batch::RecordBatch;
let id_array = Int32Array::from(vec![1, 2, 3, 4, 5]);
let schema = Schema::new(vec![
Field::new("id", DataType::Int32, false)
]);
let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(id_array)])?;
assert_eq!(batch.num_rows(), 5);
Return a new RecordBatch where each column is sliced
according to offset
and length
Panics
Panics if offset
with length
is greater than column length.
pub fn try_from_iter<I, F>(value: I) -> Result<Self> where
I: IntoIterator<Item = (F, ArrayRef)>,
F: AsRef<str>,
pub fn try_from_iter<I, F>(value: I) -> Result<Self> where
I: IntoIterator<Item = (F, ArrayRef)>,
F: AsRef<str>,
Create a RecordBatch
from an iterable list of pairs of the
form (field_name, array)
, with the same requirements on
fields and arrays as RecordBatch::try_new
. This method is
often used to create a single RecordBatch
from arrays,
e.g. for testing.
The resulting schema is marked as nullable for each column if
the array for that column is has any nulls. To explicitly
specify nullibility, use RecordBatch::try_from_iter_with_nullable
Example:
use std::sync::Arc;
use arrow::array::{ArrayRef, Int32Array, StringArray};
use arrow::datatypes::{Schema, Field, DataType};
use arrow::record_batch::RecordBatch;
let a: ArrayRef = Arc::new(Int32Array::from(vec![1, 2]));
let b: ArrayRef = Arc::new(StringArray::from(vec!["a", "b"]));
let record_batch = RecordBatch::try_from_iter(vec![
("a", a),
("b", b),
]);
pub fn try_from_iter_with_nullable<I, F>(value: I) -> Result<Self> where
I: IntoIterator<Item = (F, ArrayRef, bool)>,
F: AsRef<str>,
pub fn try_from_iter_with_nullable<I, F>(value: I) -> Result<Self> where
I: IntoIterator<Item = (F, ArrayRef, bool)>,
F: AsRef<str>,
Create a RecordBatch
from an iterable list of tuples of the
form (field_name, array, nullable)
, with the same requirements on
fields and arrays as RecordBatch::try_new
. This method is often
used to create a single RecordBatch
from arrays, e.g. for
testing.
Example:
use std::sync::Arc;
use arrow::array::{ArrayRef, Int32Array, StringArray};
use arrow::datatypes::{Schema, Field, DataType};
use arrow::record_batch::RecordBatch;
let a: ArrayRef = Arc::new(Int32Array::from(vec![1, 2]));
let b: ArrayRef = Arc::new(StringArray::from(vec![Some("a"), Some("b")]));
// Note neither `a` nor `b` has any actual nulls, but we mark
// b an nullable
let record_batch = RecordBatch::try_from_iter_with_nullable(vec![
("a", a, false),
("b", b, true),
]);
Trait Implementations
Create a record batch from struct array, where each field of
the StructArray
becomes a Field
in the schema.
This currently does not flatten and nested struct types
Performs the conversion.
This method tests for self
and other
values to be equal, and is used
by ==
. Read more
This method tests for !=
.
Auto Trait Implementations
impl !RefUnwindSafe for RecordBatch
impl Send for RecordBatch
impl Sync for RecordBatch
impl Unpin for RecordBatch
impl !UnwindSafe for RecordBatch
Blanket Implementations
Mutably borrows from an owned value. Read more