Expand description

The central type in Apache Arrow are arrays, which are a known-length sequence of values all having the same type. This module provides concrete implementations of each type, as well as an Array trait that can be used for type-erasure.

Downcasting an Array

Arrays are often passed around as a dynamically typed &dyn Array or ArrayRef. For example, RecordBatch stores columns as ArrayRef.

Whilst these arrays can be passed directly to the compute, csv, json, etc… APIs, it is often the case that you wish to interact with the data directly. This requires downcasting to the concrete type of the array:

fn sum_int32(array: &dyn Array) -> i32 {
    let integers: &Int32Array = array.as_any().downcast_ref().unwrap();
    integers.iter().map(|val| val.unwrap_or_default()).sum()
}

// Note: the values for positions corresponding to nulls will be arbitrary
fn as_f32_slice(array: &dyn Array) -> &[f32] {
    array.as_any().downcast_ref::<Float32Array>().unwrap().values()
}

Additionally, there are convenient functions to do this casting such as as_primitive_array<T> and as_string_array:

fn as_f32_slice(array: &dyn Array) -> &[f32] {
    // use as_primtive_array
    as_primitive_array::<Float32Type>(array).values()
}

Building an Array

Most Array implementations can be constructed directly from iterators or Vec

Int32Array::from(vec![1, 2]);
Int32Array::from(vec![Some(1), None]);
Int32Array::from_iter([1, 2, 3, 4]);
Int32Array::from_iter([Some(1), Some(2), None, Some(4)]);

StringArray::from(vec!["foo", "bar"]);
StringArray::from(vec![Some("foo"), None]);
StringArray::from_iter([Some("foo"), None]);
StringArray::from_iter_values(["foo", "bar"]);

ListArray::from_iter_primitive::<Int32Type, _, _>([
    Some(vec![Some(1), None, Some(3)]),
    None,
    Some(vec![])
]);

Additionally ArrayBuilder implementations can be used to construct arrays with a push-based interface

// Create a new builder with a capacity of 100
let mut builder = Int16Array::builder(100);

// Append a single primitive value
builder.append_value(1);

// Append a null value
builder.append_null();

// Append a slice of primitive values
builder.append_slice(&[2, 3, 4]);

// Build the array
let array = builder.finish();

assert_eq!(
    5,
    array.len(),
    "The array has 5 values, counting the null value"
);

assert_eq!(2, array.value(2), "Get the value with index 2");

assert_eq!(
    &array.values()[3..5],
    &[3, 4],
    "Get slice of len 2 starting at idx 3"
)

Zero-Copy Slicing

Given an Array of arbitrary length, it is possible to create an owned slice of this data. Internally this just increments some ref-counts, and so is incredibly cheap

let array = Arc::new(Int32Array::from_iter([1, 2, 3])) as ArrayRef;

// Slice with offset 1 and length 2
let sliced = array.slice(1, 2);
let ints = sliced.as_any().downcast_ref::<Int32Array>().unwrap();
assert_eq!(ints.values(), &[2, 3]);

Internal Representation

Internally, arrays are represented by one or several Buffer, the number and meaning of which depend on the array’s data type, as documented in the Arrow specification.

For example, the type Int16Array represents an array of 16-bit integers and consists of:

  • An optional Bitmap identifying any null values
  • A contiguous Buffer of 16-bit integers

Similarly, the type StringArray represents an array of UTF-8 strings and consists of:

  • An optional Bitmap identifying any null values
  • An offsets Buffer of 32-bit integers identifying valid UTF-8 sequences within the values buffer
  • A values Buffer of UTF-8 encoded string data

Structs

An generic representation of Arrow array data which encapsulates common attributes and operations for Arrow array. Specific operations for different arrays types (e.g., primitive, list, struct) are implemented in Array.

Builder for ArrayData type

an iterator that returns Some(T) or None, that can be used on any ArrayAccessor

Array of bools

Array builder for fixed-width primitive types

Builder for creating a Buffer object.

A generic Array for fixed width decimal numbers

A dictionary array where each element is a single value indexed by an integer key. This is mostly used to represent strings or a limited set of primitive types as integers, for example when doing NLP analysis or representing chromosomes by name.

An array where each element is a fixed-size sequence of bytes.

A list array where each element is a fixed-size sequence of values with the same type whose maximum length is represented by a i32.

See BinaryArray and LargeBinaryArray for storing binary data.

Generic struct for a variable-size list array.

Generic struct for [Large]StringArray

A nested array type where each record is a key-value map. Keys should always be non-null, but values can be null.

Struct to efficiently and interactively create an ArrayData from an existing ArrayData by copying chunks. The main use case of this struct is to perform unary operations to arrays of arbitrary types, such as filter and take.

An Array where all elements are nulls

Array whose elements are of primitive types.

Array builder for fixed-width primitive types

Array builder for DictionaryArray. For example to map a set of byte indices to f32 values. Note that the use of a HashMap here will not scale to very large arrays or result in an ordered dictionary.

Array builder for DictionaryArray that stores Strings. For example to map a set of byte indices to String values. Note that the use of a HashMap here will not scale to very large arrays or result in an ordered dictionary.

A nested array type where each child (called field) is represented by a separate array.

Array builder for Struct types.

A strongly-typed wrapper around a DictionaryArray that implements ArrayAccessor allowing fast access to its elements

An Array that can represent slots of varying types.

Builder type for creating a new UnionArray.

Enums

Define capacities of child data or data buffers.

Traits

Trait for dealing with different types of array at runtime when the type of the array is not known in advance.

A generic trait for accessing the values of an Array

Trait for dealing with different array builders at runtime

trait declaring an offset size, relevant for i32 vs i64 array types.

Functions

Force downcast of an Array, such as an ArrayRef to BooleanArray, panic’ing on failure.

Force downcast of an Array, such as an ArrayRef to Decimal128Array, panic’ing on failure.

Force downcast of an Array, such as an ArrayRef to DictionaryArray<T>, panic’ing on failure.

Force downcast of an Array, such as an ArrayRef to GenericBinaryArray<S>, panic’ing on failure.

Force downcast of an Array, such as an ArrayRef to GenericListArray<T>, panic’ing on failure.

Force downcast of an Array, such as an ArrayRef to LargeListArray, panic’ing on failure.

Force downcast of an Array, such as an ArrayRef to LargeStringArray, panic’ing on failure.

Force downcast of an Array, such as an ArrayRef to ListArray, panic’ing on failure.

Force downcast of an Array, such as an ArrayRef to MapArray, panic’ing on failure.

Force downcast of an Array, such as an ArrayRef to NullArray, panic’ing on failure.

Force downcast of an Array, such as an ArrayRef, to PrimitiveArray<T>, panic’ing on failure.

Force downcast of an Array, such as an ArrayRef to StringArray, panic’ing on failure.

Force downcast of an Array, such as an ArrayRef to StructArray, panic’ing on failure.

Force downcast of an Array, such as an ArrayRef to UnionArray, panic’ing on failure.

returns a comparison function that compares two values at two different positions between the two arrays. The arrays’ types must be equal.

Constructs an array using the input data. Returns a reference-counted Array instance.

Returns a builder with capacity capacity that corresponds to the datatype DataType This function is useful to construct arrays from an arbitrary vectors with known/expected schema.

Creates a new empty array

Creates a new array of data_type of length length filled entirely of NULL values

Type Definitions

A reference-counted reference to a generic Array.

An array where each element contains 0 or more bytes. The byte length of each element is represented by an i32.

Decimal128Array stores fixed width decimal numbers, with a fixed precision and scale.

an iterator that returns Some(Decimal128) or None, that can be used on a super::Decimal128Array

Decimal256Array stores fixed width decimal numbers, with a fixed precision and scale

an iterator that returns Some(Decimal256) or None, that can be used on a super::Decimal256Array

DecimalBuilderDeprecated

Compare the values at two arbitrary indices in two arrays.

Example: Using collect

Example: Using collect

Example: Using collect

Example: Using collect

A dictionary array where each element is a single value indexed by an integer key.

Example: Using collect

A dictionary array where each element is a single value indexed by an integer key.

Example: Using collect

A dictionary array where each element is a single value indexed by an integer key.

Example: Using collect

A dictionary array where each element is a single value indexed by an integer key.

An array where each element contains 0 or more bytes. The byte length of each element is represented by an i64.

A list array where each element is a variable-sized sequence of values with the same type whose memory offsets between elements are represented by a i64.

An array where each element is a variable-sized sequence of bytes representing a string whose maximum length (in bytes) is represented by a i64.

A list array where each element is a variable-sized sequence of values with the same type whose memory offsets between elements are represented by a i32.

an iterator that returns Some(T) or None, that can be used on any PrimitiveArray

An array where each element is a variable-sized sequence of bytes representing a string whose maximum length (in bytes) is represented by a i32.

A primitive array where each element is of type TimestampMicrosecondType. See examples for TimestampSecondArray.

A primitive array where each element is of type TimestampMillisecondType. See examples for TimestampSecondArray.

A primitive array where each element is of type TimestampNanosecondType. See examples for TimestampSecondArray.

A primitive array where each element is of type TimestampSecondType. See also Timestamp.

Example: Using collect

A dictionary array where each element is a single value indexed by an integer key.

Example: Using collect

A dictionary array where each element is a single value indexed by an integer key.

Example: Using collect

A dictionary array where each element is a single value indexed by an integer key.

Example: Using collect

A dictionary array where each element is a single value indexed by an integer key.