Module arrow::array

source · []
Expand description

The central type in Apache Arrow are arrays, which are a known-length sequence of values all having the same type. This module provides concrete implementations of each type, as well as an Array trait that can be used for type-erasure.

Downcasting an Array

Arrays are often passed around as a dynamically typed &dyn Array or ArrayRef. For example, RecordBatch stores columns as ArrayRef.

Whilst these arrays can be passed directly to the compute, csv, json, etc… APIs, it is often the case that you wish to interact with the data directly. This requires downcasting to the concrete type of the array:

fn sum_int32(array: &dyn Array) -> i32 {
    let integers: &Int32Array = array.as_any().downcast_ref().unwrap();
    integers.iter().map(|val| val.unwrap_or_default()).sum()
}

// Note: the values for positions corresponding to nulls will be arbitrary
fn as_f32_slice(array: &dyn Array) -> &[f32] {
    array.as_any().downcast_ref::<Float32Array>().unwrap().values()
}

Building an Array

Most Array implementations can be constructed directly from iterators or Vec

Int32Array::from(vec![1, 2]);
Int32Array::from(vec![Some(1), None]);
Int32Array::from_iter([1, 2, 3, 4]);
Int32Array::from_iter([Some(1), Some(2), None, Some(4)]);

StringArray::from(vec!["foo", "bar"]);
StringArray::from(vec![Some("foo"), None]);
StringArray::from_iter([Some("foo"), None]);
StringArray::from_iter_values(["foo", "bar"]);

ListArray::from_iter_primitive::<Int32Type, _, _>([
    Some(vec![Some(1), None, Some(3)]),
    None,
    Some(vec![])
]);

Additionally ArrayBuilder implementations can be used to construct arrays with a push-based interface

// Create a new builder with a capacity of 100
let mut builder = Int16Array::builder(100);

// Append a single primitive value
builder.append_value(1).unwrap();

// Append a null value
builder.append_null().unwrap();

// Append a slice of primitive values
builder.append_slice(&[2, 3, 4]).unwrap();

// Build the array
let array = builder.finish();

assert_eq!(
    5,
    array.len(),
    "The array has 5 values, counting the null value"
);

assert_eq!(2, array.value(2), "Get the value with index 2");

assert_eq!(
    &array.values()[3..5],
    &[3, 4],
    "Get slice of len 2 starting at idx 3"
)

Zero-Copy Slicing

Given an Array of arbitrary length, it is possible to create an owned slice of this data. Internally this just increments some ref-counts, and so is incredibly cheap

let array = Arc::new(Int32Array::from_iter([1, 2, 3])) as ArrayRef;

// Slice with offset 1 and length 2
let sliced = array.slice(1, 2);
let ints = sliced.as_any().downcast_ref::<Int32Array>().unwrap();
assert_eq!(ints.values(), &[2, 3]);

Internal Representation

Internally, arrays are represented by one or several Buffer, the number and meaning of which depend on the array’s data type, as documented in the Arrow specification.

For example, the type Int16Array represents an array of 16-bit integers and consists of:

  • An optional Bitmap identifying any null values
  • A contiguous Buffer of 16-bit integers

Similarly, the type StringArray represents an array of UTF-8 strings and consists of:

  • An optional Bitmap identifying any null values
  • An offsets Buffer of 32-bit integers identifying valid UTF-8 sequences within the values buffer
  • A values Buffer of UTF-8 encoded string data

Structs

An generic representation of Arrow array data which encapsulates common attributes and operations for Arrow array. Specific operations for different arrays types (e.g., primitive, list, struct) are implemented in Array.

Builder for ArrayData type

Array of bools

Array builder for fixed-width primitive types

an iterator that returns Some(bool) or None.

Builder for creating a Buffer object.

DecimalArray stores fixed width decimal numbers, with a fixed precision and scale.

Array Builder for DecimalArray

an iterator that returns Some(i128) or None, that can be used on a DecimalArray

A dictionary array where each element is a single value indexed by an integer key. This is mostly used to represent strings or a limited set of primitive types as integers, for example when doing NLP analysis or representing chromosomes by name.

An array where each element is a fixed-size sequence of bytes.

A list array where each element is a fixed-size sequence of values with the same type whose maximum length is represented by a i32.

See BinaryArray and LargeBinaryArray for storing binary data.

an iterator that returns Some(&[u8]) or None, for binary arrays

Generic struct for a variable-size list array.

Array builder for ListArray

Generic struct for [Large]StringArray

an iterator that returns Some(&str) or None, for string arrays

A nested array type where each record is a key-value map. Keys should always be non-null, but values can be null.

Struct to efficiently and interactively create an ArrayData from an existing ArrayData by copying chunks. The main use case of this struct is to perform unary operations to arrays of arbitrary types, such as filter and take.

An Array where all elements are nulls

Array whose elements are of primitive types.

Array builder for fixed-width primitive types

Array builder for DictionaryArray. For example to map a set of byte indices to f32 values. Note that the use of a HashMap here will not scale to very large arrays or result in an ordered dictionary.

an iterator that returns Some(T) or None, that can be used on any PrimitiveArray

Array builder for DictionaryArray that stores Strings. For example to map a set of byte indices to String values. Note that the use of a HashMap here will not scale to very large arrays or result in an ordered dictionary.

A nested array type where each child (called field) is represented by a separate array.

Array builder for Struct types.

An Array that can represent slots of varying types.

Builder type for creating a new UnionArray.

Enums

Define capacities of child data or data buffers.

Traits

Trait for dealing with different types of array at runtime when the type of the array is not known in advance.

Trait for dealing with different array builders at runtime

Trait for comparing arrow array with json array

trait declaring an offset size, relevant for i32 vs i64 array types.

Functions

Force downcast ArrayRef to BooleanArray

Force downcast ArrayRef to DecimalArray

Force downcast ArrayRef to DictionaryArray

Force downcast ArrayRef to GenericBinaryArray

Force downcast ArrayRef to GenericListArray

Force downcast ArrayRef to LargeListArray

Force downcast ArrayRef to LargeStringArray

Force downcast ArrayRef to ListArray

Force downcast ArrayRef to MapArray

Force downcast ArrayRef to NullArray

Force downcast ArrayRef to PrimitiveArray

Force downcast ArrayRef to StringArray

Force downcast ArrayRef to StructArray

Force downcast ArrayRef to UnionArray

returns a comparison function that compares two values at two different positions between the two arrays. The arrays’ types must be equal.

Exports an array to raw pointers of the C Data Interface provided by the consumer.

Constructs an array using the input data. Returns a reference-counted Array instance.

Creates a new array from two FFI pointers. Used to import arrays from the C Data Interface

Returns a builder with capacity capacity that corresponds to the datatype DataType This function is useful to construct arrays from an arbitrary vectors with known/expected schema.

Creates a new empty array

Creates a new array of data_type of length length filled entirely of NULL values

Type Definitions

A reference-counted reference to a generic Array.

An array where each element contains 0 or more bytes. The byte length of each element is represented by an i32.

Compare the values at two arbitrary indices in two arrays.

Example: Using collect

Example: Using collect

Example: Using collect

Example: Using collect

A dictionary array where each element is a single value indexed by an integer key.

Example: Using collect

A dictionary array where each element is a single value indexed by an integer key.

Example: Using collect

A dictionary array where each element is a single value indexed by an integer key.

Example: Using collect

A dictionary array where each element is a single value indexed by an integer key.

An array where each element contains 0 or more bytes. The byte length of each element is represented by an i64.

A list array where each element is a variable-sized sequence of values with the same type whose memory offsets between elements are represented by a i64.

An array where each element is a variable-sized sequence of bytes representing a string whose maximum length (in bytes) is represented by a i64.

A list array where each element is a variable-sized sequence of values with the same type whose memory offsets between elements are represented by a i32.

An array where each element is a variable-sized sequence of bytes representing a string whose maximum length (in bytes) is represented by a i32.

A primitive array where each element is of type TimestampMicrosecondType. See examples for TimestampSecondArray.

A primitive array where each element is of type TimestampMillisecondType. See examples for TimestampSecondArray.

A primitive array where each element is of type TimestampNanosecondType. See examples for TimestampSecondArray.

A primitive array where each element is of type TimestampSecondType. See also Timestamp.

Example: Using collect

A dictionary array where each element is a single value indexed by an integer key.

Example: Using collect

A dictionary array where each element is a single value indexed by an integer key.

Example: Using collect

A dictionary array where each element is a single value indexed by an integer key.

Example: Using collect

A dictionary array where each element is a single value indexed by an integer key.