Expand description
The central type in Apache Arrow are arrays, which are a known-length sequence of values
all having the same type. This module provides concrete implementations of each type, as
well as an Array
trait that can be used for type-erasure.
Downcasting an Array
Arrays are often passed around as a dynamically typed &dyn Array
or ArrayRef
.
For example, RecordBatch
stores columns as ArrayRef
.
Whilst these arrays can be passed directly to the compute
,
csv
, json
, etc… APIs, it is often the case that you wish
to interact with the data directly. This requires downcasting to the concrete type of the array:
fn sum_int32(array: &dyn Array) -> i32 {
let integers: &Int32Array = array.as_any().downcast_ref().unwrap();
integers.iter().map(|val| val.unwrap_or_default()).sum()
}
// Note: the values for positions corresponding to nulls will be arbitrary
fn as_f32_slice(array: &dyn Array) -> &[f32] {
array.as_any().downcast_ref::<Float32Array>().unwrap().values()
}
Building an Array
Most Array
implementations can be constructed directly from iterators or Vec
Int32Array::from(vec![1, 2]);
Int32Array::from(vec![Some(1), None]);
Int32Array::from_iter([1, 2, 3, 4]);
Int32Array::from_iter([Some(1), Some(2), None, Some(4)]);
StringArray::from(vec!["foo", "bar"]);
StringArray::from(vec![Some("foo"), None]);
StringArray::from_iter([Some("foo"), None]);
StringArray::from_iter_values(["foo", "bar"]);
ListArray::from_iter_primitive::<Int32Type, _, _>([
Some(vec![Some(1), None, Some(3)]),
None,
Some(vec![])
]);
Additionally ArrayBuilder
implementations can be
used to construct arrays with a push-based interface
// Create a new builder with a capacity of 100
let mut builder = Int16Array::builder(100);
// Append a single primitive value
builder.append_value(1).unwrap();
// Append a null value
builder.append_null().unwrap();
// Append a slice of primitive values
builder.append_slice(&[2, 3, 4]).unwrap();
// Build the array
let array = builder.finish();
assert_eq!(
5,
array.len(),
"The array has 5 values, counting the null value"
);
assert_eq!(2, array.value(2), "Get the value with index 2");
assert_eq!(
&array.values()[3..5],
&[3, 4],
"Get slice of len 2 starting at idx 3"
)
Zero-Copy Slicing
Given an Array
of arbitrary length, it is possible to create an owned slice of this
data. Internally this just increments some ref-counts, and so is incredibly cheap
let array = Arc::new(Int32Array::from_iter([1, 2, 3])) as ArrayRef;
// Slice with offset 1 and length 2
let sliced = array.slice(1, 2);
let ints = sliced.as_any().downcast_ref::<Int32Array>().unwrap();
assert_eq!(ints.values(), &[2, 3]);
Internal Representation
Internally, arrays are represented by one or several Buffer
, the number and meaning of
which depend on the array’s data type, as documented in the Arrow specification.
For example, the type Int16Array
represents an array of 16-bit integers and consists of:
Similarly, the type StringArray
represents an array of UTF-8 strings and consists of:
Structs
An generic representation of Arrow array data which encapsulates common attributes and
operations for Arrow array. Specific operations for different arrays types (e.g.,
primitive, list, struct) are implemented in Array
.
Builder for ArrayData
type
Array of bools
Array builder for fixed-width primitive types
an iterator that returns Some(bool) or None.
Builder for creating a Buffer
object.
DecimalArray
stores fixed width decimal numbers,
with a fixed precision and scale.
Array Builder for DecimalArray
an iterator that returns Some(i128)
or None
, that can be used on a
DecimalArray
A dictionary array where each element is a single value indexed by an integer key. This is mostly used to represent strings or a limited set of primitive types as integers, for example when doing NLP analysis or representing chromosomes by name.
An array where each element is a fixed-size sequence of bytes.
A list array where each element is a fixed-size sequence of values with the same type whose maximum length is represented by a i32.
Array builder for ListArray
See BinaryArray
and LargeBinaryArray
for storing
binary data.
an iterator that returns Some(&[u8])
or None
, for binary arrays
Generic struct for a variable-size list array.
Array builder for ListArray
Generic struct for [Large]StringArray
an iterator that returns Some(&str)
or None
, for string arrays
A nested array type where each record is a key-value map. Keys should always be non-null, but values can be null.
An Array where all elements are nulls
Array whose elements are of primitive types.
Array builder for fixed-width primitive types
Array builder for DictionaryArray
. For example to map a set of byte indices
to f32 values. Note that the use of a HashMap
here will not scale to very large
arrays or result in an ordered dictionary.
an iterator that returns Some(T) or None, that can be used on any PrimitiveArray
Array builder for DictionaryArray
that stores Strings. For example to map a set of byte indices
to String values. Note that the use of a HashMap
here will not scale to very large
arrays or result in an ordered dictionary.
A nested array type where each child (called field) is represented by a separate array.
Array builder for Struct types.
An Array that can represent slots of varying types.
Builder type for creating a new UnionArray
.
Enums
Define capacities of child data or data buffers.
Traits
Trait for dealing with different types of array at runtime when the type of the array is not known in advance.
Trait for dealing with different array builders at runtime
Trait for comparing arrow array with json array
trait declaring an offset size, relevant for i32 vs i64 array types.
Functions
Force downcast ArrayRef to BooleanArray
Force downcast ArrayRef to DecimalArray
Force downcast ArrayRef to DictionaryArray
Force downcast ArrayRef to GenericBinaryArray
Force downcast ArrayRef to GenericListArray
Force downcast ArrayRef to LargeListArray
Force downcast ArrayRef to LargeStringArray
Force downcast ArrayRef to ListArray
Force downcast ArrayRef to MapArray
Force downcast ArrayRef to NullArray
Force downcast ArrayRef to PrimitiveArray
Force downcast ArrayRef to StringArray
Force downcast ArrayRef to StructArray
Force downcast ArrayRef to UnionArray
returns a comparison function that compares two values at two different positions between the two arrays. The arrays’ types must be equal.
Exports an array to raw pointers of the C Data Interface provided by the consumer.
Constructs an array using the input data
.
Returns a reference-counted Array
instance.
Creates a new array from two FFI pointers. Used to import arrays from the C Data Interface
Returns a builder with capacity capacity
that corresponds to the datatype DataType
This function is useful to construct arrays from an arbitrary vectors with known/expected
schema.
Creates a new empty array
Creates a new array of data_type
of length length
filled
entirely of NULL
values
Type Definitions
A reference-counted reference to a generic Array
.
An array where each element contains 0 or more bytes. The byte length of each element is represented by an i32.
Compare the values at two arbitrary indices in two arrays.
Example: Using collect
Example: Using collect
Example: Using collect
Example: Using collect
A dictionary array where each element is a single value indexed by an integer key.
Example: Using collect
A dictionary array where each element is a single value indexed by an integer key.
Example: Using collect
A dictionary array where each element is a single value indexed by an integer key.
Example: Using collect
A dictionary array where each element is a single value indexed by an integer key.
An array where each element contains 0 or more bytes. The byte length of each element is represented by an i64.
A list array where each element is a variable-sized sequence of values with the same type whose memory offsets between elements are represented by a i64.
An array where each element is a variable-sized sequence of bytes representing a string whose maximum length (in bytes) is represented by a i64.
A list array where each element is a variable-sized sequence of values with the same type whose memory offsets between elements are represented by a i32.
An array where each element is a variable-sized sequence of bytes representing a string whose maximum length (in bytes) is represented by a i32.
A primitive array where each element is of type TimestampMicrosecondType.
See examples for TimestampSecondArray.
A primitive array where each element is of type TimestampMillisecondType.
See examples for TimestampSecondArray.
A primitive array where each element is of type TimestampNanosecondType.
See examples for TimestampSecondArray.
A primitive array where each element is of type TimestampSecondType.
See also Timestamp
.
Example: Using collect
A dictionary array where each element is a single value indexed by an integer key.
Example: Using collect
A dictionary array where each element is a single value indexed by an integer key.
Example: Using collect
A dictionary array where each element is a single value indexed by an integer key.
Example: Using collect
A dictionary array where each element is a single value indexed by an integer key.