Crate arrow_array
source ·Expand description
The central type in Apache Arrow are arrays, which are a known-length sequence of values
all having the same type. This crate provides concrete implementations of each type, as
well as an Array trait that can be used for type-erasure.
Downcasting an Array
Arrays are often passed around as a dynamically typed &dyn Array or ArrayRef.
For example, RecordBatch stores columns as ArrayRef.
Whilst these arrays can be passed directly to the compute, csv, json, etc… APIs,
it is often the case that you wish to interact with the data directly.
This requires downcasting to the concrete type of the array:
fn sum_int32(array: &dyn Array) -> i32 {
let integers: &Int32Array = array.as_any().downcast_ref().unwrap();
integers.iter().map(|val| val.unwrap_or_default()).sum()
}
// Note: the values for positions corresponding to nulls will be arbitrary
fn as_f32_slice(array: &dyn Array) -> &[f32] {
array.as_any().downcast_ref::<Float32Array>().unwrap().values()
}The cast::AsArray extension trait can make this more ergonomic
fn as_f32_slice(array: &dyn Array) -> &[f32] {
array.as_primitive::<Float32Type>().values()
}Building an Array
Most Array implementations can be constructed directly from iterators or Vec
Int32Array::from(vec![1, 2]);
Int32Array::from(vec![Some(1), None]);
Int32Array::from_iter([1, 2, 3, 4]);
Int32Array::from_iter([Some(1), Some(2), None, Some(4)]);
StringArray::from(vec!["foo", "bar"]);
StringArray::from(vec![Some("foo"), None]);
StringArray::from_iter([Some("foo"), None]);
StringArray::from_iter_values(["foo", "bar"]);
ListArray::from_iter_primitive::<Int32Type, _, _>([
Some(vec![Some(1), None, Some(3)]),
None,
Some(vec![])
]);Additionally ArrayBuilder implementations can be
used to construct arrays with a push-based interface
// Create a new builder with a capacity of 100
let mut builder = Int16Array::builder(100);
// Append a single primitive value
builder.append_value(1);
// Append a null value
builder.append_null();
// Append a slice of primitive values
builder.append_slice(&[2, 3, 4]);
// Build the array
let array = builder.finish();
assert_eq!(
5,
array.len(),
"The array has 5 values, counting the null value"
);
assert_eq!(2, array.value(2), "Get the value with index 2");
assert_eq!(
&array.values()[3..5],
&[3, 4],
"Get slice of len 2 starting at idx 3"
)Zero-Copy Slicing
Given an Array of arbitrary length, it is possible to create an owned slice of this
data. Internally this just increments some ref-counts, and so is incredibly cheap
let array = Arc::new(Int32Array::from_iter([1, 2, 3])) as ArrayRef;
// Slice with offset 1 and length 2
let sliced = array.slice(1, 2);
let ints = sliced.as_any().downcast_ref::<Int32Array>().unwrap();
assert_eq!(ints.values(), &[2, 3]);Internal Representation
Internally, arrays are represented by one or several Buffer, the number and meaning of
which depend on the array’s data type, as documented in the Arrow specification.
For example, the type Int16Array represents an array of 16-bit integers and consists of:
- An optional
NullBufferidentifying any null values - A contiguous
Bufferof 16-bit integers
Similarly, the type StringArray represents an array of UTF-8 strings and consists of:
- An optional
NullBufferidentifying any null values - An offsets
Bufferof 32-bit integers identifying valid UTF-8 sequences within the values buffer - A values
Bufferof UTF-8 encoded string data
Re-exports
pub use array::*;
Modules
- The concrete array definitions
- Defines builders for the various array types
- Defines helper functions for downcasting
dyn Arrayto concrete types - Idiomatic iterators for
Array - Idiomatic iterator for
RunArray - Conversion methods for dates and times.
- Timezone for timestamp arrays
- Zero-sized types used to parameterize generic array implementations
Macros
- Downcast an
Arrayto aDictionaryArraybased on itsDataType, accepts a number of subsequent patterns to match the data type - Given one or more expressions evaluating to an integer
DataTypeinvokes the provided macromwith the corresponding integerArrowPrimitiveType, followed by any additional arguments - Given one or more expressions evaluating to primitive
DataTypeinvokes the provided macromwith the correspondingArrowPrimitiveType, followed by any additional arguments - Downcast an
Arrayto aPrimitiveArraybased on itsDataTypeaccepts a number of subsequent patterns to match the data type - Given one or more expressions evaluating to an integer
DataTypeinvokes the provided macromwith the corresponding integerRunEndIndexType, followed by any additional arguments - Given one or more expressions evaluating to primitive
DataTypeinvokes the provided macromwith the correspondingArrowPrimitiveType, followed by any additional arguments - Downcast an
Arrayto a temporalPrimitiveArraybased on itsDataTypeaccepts a number of subsequent patterns to match the data type
Structs
- A two-dimensional batch of column-oriented data with a defined schema.
- Generic implementation of RecordBatchReader that wraps an iterator.
- Options that control the behaviour used when creating a
RecordBatch.
Traits
- A subtype of primitive type that represents numeric float values
- Trait for
ArrowNativeTypethat adds checked and unchecked arithmetic operations, and totally ordered comparison operations - A subtype of primitive type that represents numeric values.
- Trait for types that can read
RecordBatch’s.