Crate arrow_array
source ·Expand description
The central type in Apache Arrow are arrays, which are a known-length sequence of values
all having the same type. This crate provides concrete implementations of each type, as
well as an Array
trait that can be used for type-erasure.
Downcasting an Array
Arrays are often passed around as a dynamically typed &dyn Array
or ArrayRef
.
For example, RecordBatch
stores columns as ArrayRef
.
Whilst these arrays can be passed directly to the compute
, csv
, json
, etc… APIs,
it is often the case that you wish to interact with the data directly.
This requires downcasting to the concrete type of the array:
fn sum_int32(array: &dyn Array) -> i32 {
let integers: &Int32Array = array.as_any().downcast_ref().unwrap();
integers.iter().map(|val| val.unwrap_or_default()).sum()
}
// Note: the values for positions corresponding to nulls will be arbitrary
fn as_f32_slice(array: &dyn Array) -> &[f32] {
array.as_any().downcast_ref::<Float32Array>().unwrap().values()
}
The cast::AsArray
extension trait can make this more ergonomic
fn as_f32_slice(array: &dyn Array) -> &[f32] {
array.as_primitive::<Float32Type>().values()
}
Building an Array
Most Array
implementations can be constructed directly from iterators or Vec
Int32Array::from(vec![1, 2]);
Int32Array::from(vec![Some(1), None]);
Int32Array::from_iter([1, 2, 3, 4]);
Int32Array::from_iter([Some(1), Some(2), None, Some(4)]);
StringArray::from(vec!["foo", "bar"]);
StringArray::from(vec![Some("foo"), None]);
StringArray::from_iter([Some("foo"), None]);
StringArray::from_iter_values(["foo", "bar"]);
ListArray::from_iter_primitive::<Int32Type, _, _>([
Some(vec![Some(1), None, Some(3)]),
None,
Some(vec![])
]);
Additionally ArrayBuilder
implementations can be
used to construct arrays with a push-based interface
// Create a new builder with a capacity of 100
let mut builder = Int16Array::builder(100);
// Append a single primitive value
builder.append_value(1);
// Append a null value
builder.append_null();
// Append a slice of primitive values
builder.append_slice(&[2, 3, 4]);
// Build the array
let array = builder.finish();
assert_eq!(
5,
array.len(),
"The array has 5 values, counting the null value"
);
assert_eq!(2, array.value(2), "Get the value with index 2");
assert_eq!(
&array.values()[3..5],
&[3, 4],
"Get slice of len 2 starting at idx 3"
)
Zero-Copy Slicing
Given an Array
of arbitrary length, it is possible to create an owned slice of this
data. Internally this just increments some ref-counts, and so is incredibly cheap
let array = Arc::new(Int32Array::from_iter([1, 2, 3])) as ArrayRef;
// Slice with offset 1 and length 2
let sliced = array.slice(1, 2);
let ints = sliced.as_any().downcast_ref::<Int32Array>().unwrap();
assert_eq!(ints.values(), &[2, 3]);
Internal Representation
Internally, arrays are represented by one or several Buffer
, the number and meaning of
which depend on the array’s data type, as documented in the Arrow specification.
For example, the type Int16Array
represents an array of 16-bit integers and consists of:
- An optional
NullBuffer
identifying any null values - A contiguous
Buffer
of 16-bit integers
Similarly, the type StringArray
represents an array of UTF-8 strings and consists of:
- An optional
NullBuffer
identifying any null values - An offsets
Buffer
of 32-bit integers identifying valid UTF-8 sequences within the values buffer - A values
Buffer
of UTF-8 encoded string data
Re-exports
pub use array::*;
Modules
- The concrete array definitions
- Defines builders that can be used to safely build arrays
- Defines helper functions for downcasting
dyn Array
to concrete types - Idiomatic iterators for
Array
- Idiomatic iterator for
RunArray
- Conversion methods for dates and times.
- Timezone for timestamp arrays
- Zero-sized types used to parameterize generic array implementations
Macros
- Downcast an
Array
to aDictionaryArray
based on itsDataType
, accepts a number of subsequent patterns to match the data type - Given one or more expressions evaluating to an integer
DataType
invokes the provided macrom
with the corresponding integerArrowPrimitiveType
, followed by any additional arguments - Given one or more expressions evaluating to primitive
DataType
invokes the provided macrom
with the correspondingArrowPrimitiveType
, followed by any additional arguments - Downcast an
Array
to aPrimitiveArray
based on itsDataType
accepts a number of subsequent patterns to match the data type - Given one or more expressions evaluating to an integer
DataType
invokes the provided macrom
with the corresponding integerRunEndIndexType
, followed by any additional arguments - Given one or more expressions evaluating to primitive
DataType
invokes the provided macrom
with the correspondingArrowPrimitiveType
, followed by any additional arguments - Downcast an
Array
to a temporalPrimitiveArray
based on itsDataType
accepts a number of subsequent patterns to match the data type
Structs
- A two-dimensional batch of column-oriented data with a defined schema.
- Generic implementation of RecordBatchReader that wraps an iterator.
- Options that control the behaviour used when creating a
RecordBatch
.
Traits
- A subtype of primitive type that represents numeric float values
- Trait for
ArrowNativeType
that adds checked and unchecked arithmetic operations, and totally ordered comparison operations - A subtype of primitive type that represents numeric values.
- Trait for types that can read
RecordBatch
’s.