Docs.rs
  • arrow-array-53.3.0
    • arrow-array 53.3.0
    • Docs.rs crate page
    • Apache-2.0
    • Links
    • Homepage
    • Repository
    • crates.io
    • Source
    • Owners
    • andygrove
    • xhochy
    • nevi-me
    • alamb
    • tustvold
    • Dependencies
      • arrow-buffer ^53.3.0 normal
      • arrow-data ^53.3.0 normal
      • arrow-schema ^53.3.0 normal
      • chrono ^0.4.34 normal
      • chrono-tz ^0.10 normal optional
      • half ^2.1 normal
      • hashbrown ^0.15.1 normal
      • num ^0.4.1 normal
      • criterion ^0.5 dev
      • rand ^0.8 dev
      • ahash ^0.8 normal
      • ahash ^0.8 normal
    • Versions
    • 100% of the crate is documented
  • Go to latest version
  • Platform
    • i686-pc-windows-msvc
    • i686-unknown-linux-gnu
    • x86_64-apple-darwin
    • x86_64-pc-windows-msvc
    • x86_64-unknown-linux-gnu
  • Feature flags
  • docs.rs
    • About docs.rs
    • Badges
    • Builds
    • Metadata
    • Shorthand URLs
    • Download
    • Rustdoc JSON
    • Build queue
    • Privacy policy
  • Rust
    • Rust website
    • The Book
    • Standard Library API Reference
    • Rust by Example
    • The Cargo Guide
    • Clippy Documentation

Crate arrow_array

arrow_array53.3.0

  • All Items

Sections

  • Building an Array
  • Low-level API
  • Zero-Copy Slicing
  • Downcasting an Array
  • Alternatives to ChunkedArray Support

Crate Items

  • Modules
  • Macros
  • Structs
  • Traits

Crates

  • arrow_array

Crate arrow_array

Source
Expand description

The central type in Apache Arrow are arrays, which are a known-length sequence of values all having the same type. This crate provides concrete implementations of each type, as well as an Array trait that can be used for type-erasure.

§Building an Array

Most Array implementations can be constructed directly from iterators or Vec

Int32Array::from(vec![1, 2]);
Int32Array::from(vec![Some(1), None]);
Int32Array::from_iter([1, 2, 3, 4]);
Int32Array::from_iter([Some(1), Some(2), None, Some(4)]);

StringArray::from(vec!["foo", "bar"]);
StringArray::from(vec![Some("foo"), None]);
StringArray::from_iter([Some("foo"), None]);
StringArray::from_iter_values(["foo", "bar"]);

ListArray::from_iter_primitive::<Int32Type, _, _>([
    Some(vec![Some(1), None, Some(3)]),
    None,
    Some(vec![])
]);

Additionally ArrayBuilder implementations can be used to construct arrays with a push-based interface

// Create a new builder with a capacity of 100
let mut builder = Int16Array::builder(100);

// Append a single primitive value
builder.append_value(1);
// Append a null value
builder.append_null();
// Append a slice of primitive values
builder.append_slice(&[2, 3, 4]);

// Build the array
let array = builder.finish();

assert_eq!(5, array.len());
assert_eq!(2, array.value(2));
assert_eq!(&array.values()[3..5], &[3, 4])

§Low-level API

Internally, arrays consist of one or more shared memory regions backed by a Buffer, the number and meaning of which depend on the array’s data type, as documented in the Arrow specification.

For example, the type Int16Array represents an array of 16-bit integers and consists of:

  • An optional NullBuffer identifying any null values
  • A contiguous ScalarBuffer<i16> of values

Similarly, the type StringArray represents an array of UTF-8 strings and consists of:

  • An optional NullBuffer identifying any null values
  • An offsets OffsetBuffer<i32> identifying valid UTF-8 sequences within the values buffer
  • A values Buffer of UTF-8 encoded string data

Array constructors such as PrimitiveArray::try_new provide the ability to cheaply construct an array from these parts, with functions such as PrimitiveArray::into_parts providing the reverse operation.

// Create a Int32Array from Vec without copying
let array = Int32Array::new(vec![1, 2, 3].into(), None);
assert_eq!(array.values(), &[1, 2, 3]);
assert_eq!(array.null_count(), 0);

// Create a StringArray from parts
let offsets = OffsetBuffer::new(vec![0, 5, 10].into());
let array = StringArray::new(offsets, b"helloworld".into(), None);
let values: Vec<_> = array.iter().map(|x| x.unwrap()).collect();
assert_eq!(values, &["hello", "world"]);

As Buffer, and its derivatives, can be created from Vec without copying, this provides an efficient way to not only interoperate with other Rust code, but also implement kernels optimised for the arrow data layout - e.g. by handling buffers instead of values.

§Zero-Copy Slicing

Given an Array of arbitrary length, it is possible to create an owned slice of this data. Internally this just increments some ref-counts, and so is incredibly cheap

let array = Int32Array::from_iter([1, 2, 3]);

// Slice with offset 1 and length 2
let sliced = array.slice(1, 2);
assert_eq!(sliced.values(), &[2, 3]);

§Downcasting an Array

Arrays are often passed around as a dynamically typed &dyn Array or ArrayRef. For example, RecordBatch stores columns as ArrayRef.

Whilst these arrays can be passed directly to the compute, csv, json, etc… APIs, it is often the case that you wish to interact with the concrete arrays directly.

This requires downcasting to the concrete type of the array:


// Safely downcast an `Array` to an `Int32Array` and compute the sum
// using native i32 values
fn sum_int32(array: &dyn Array) -> i32 {
    let integers: &Int32Array = array.as_any().downcast_ref().unwrap();
    integers.iter().map(|val| val.unwrap_or_default()).sum()
}

// Safely downcasts the array to a `Float32Array` and returns a &[f32] view of the data
// Note: the values for positions corresponding to nulls will be arbitrary (but still valid f32)
fn as_f32_slice(array: &dyn Array) -> &[f32] {
    array.as_any().downcast_ref::<Float32Array>().unwrap().values()
}

The cast::AsArray extension trait can make this more ergonomic


fn as_f32_slice(array: &dyn Array) -> &[f32] {
    array.as_primitive::<Float32Type>().values()
}

§Alternatives to ChunkedArray Support

The Rust implementation does not provide the ChunkedArray abstraction implemented by the Python and C++ Arrow implementations. The recommended alternative is to use one of the following:

  • Vec<ArrayRef> a simple, eager version of a ChunkedArray
  • impl Iterator<Item=ArrayRef> a lazy version of a ChunkedArray
  • impl Stream<Item=ArrayRef> a lazy async version of a ChunkedArray

Similar patterns can be applied at the RecordBatch level. For example, DataFusion makes extensive use of RecordBatchStream.

This approach integrates well into the Rust ecosystem, simplifies the implementation and encourages the use of performant lazy and async patterns.

use std::sync::Arc;
use arrow_array::{ArrayRef, Float32Array, RecordBatch, StringArray};
use arrow_array::cast::AsArray;
use arrow_array::types::Float32Type;
use arrow_schema::DataType;

let batches = [
   RecordBatch::try_from_iter(vec![
        ("label", Arc::new(StringArray::from(vec!["A", "B", "C"])) as ArrayRef),
        ("value", Arc::new(Float32Array::from(vec![0.1, 0.2, 0.3])) as ArrayRef),
    ]).unwrap(),
   RecordBatch::try_from_iter(vec![
        ("label", Arc::new(StringArray::from(vec!["D", "E"])) as ArrayRef),
        ("value", Arc::new(Float32Array::from(vec![0.4, 0.5])) as ArrayRef),
   ]).unwrap(),
];

let labels: Vec<&str> = batches
   .iter()
   .flat_map(|batch| batch.column(0).as_string::<i32>())
   .map(Option::unwrap)
   .collect();

let values: Vec<f32> = batches
   .iter()
   .flat_map(|batch| batch.column(1).as_primitive::<Float32Type>().values())
   .copied()
   .collect();

assert_eq!(labels, ["A", "B", "C", "D", "E"]);
assert_eq!(values, [0.1, 0.2, 0.3, 0.4, 0.5]);

Re-exports§

  • pub use array::*;

Modules§

  • array
    The concrete array definitions
  • builder
    Defines push-based APIs for constructing arrays
  • cast
    Defines helper functions for downcasting dyn Array to concrete types
  • iterator
    Idiomatic iterators for Array
  • run_iterator
    Idiomatic iterator for RunArray
  • temporal_conversions
    Conversion methods for dates and times.
  • timezone
    Timezone for timestamp arrays
  • types
    Zero-sized types used to parameterize generic array implementations

Macros§

  • create_array
    Creates an array from a literal slice of values, suitable for rapid testing and development.
  • downcast_dictionary_array
    Downcast an Array to a DictionaryArray based on its DataType, accepts a number of subsequent patterns to match the data type
  • downcast_integer
    Given one or more expressions evaluating to an integer DataType invokes the provided macro m with the corresponding integer ArrowPrimitiveType, followed by any additional arguments
  • downcast_primitive
    Given one or more expressions evaluating to primitive DataType invokes the provided macro m with the corresponding ArrowPrimitiveType, followed by any additional arguments
  • downcast_primitive_array
    Downcast an Array to a PrimitiveArray based on its DataType accepts a number of subsequent patterns to match the data type
  • downcast_run_array
    Downcast an Array to a RunArray based on its DataType, accepts a number of subsequent patterns to match the data type
  • downcast_run_end_index
    Given one or more expressions evaluating to an integer DataType invokes the provided macro m with the corresponding integer RunEndIndexType, followed by any additional arguments
  • downcast_temporal
    Given one or more expressions evaluating to primitive DataType invokes the provided macro m with the corresponding ArrowPrimitiveType, followed by any additional arguments
  • downcast_temporal_array
    Downcast an Array to a temporal PrimitiveArray based on its DataType accepts a number of subsequent patterns to match the data type
  • record_batch
    Creates a record batch from literal slice of values, suitable for rapid testing and development.

Structs§

  • RecordBatch
    A two-dimensional batch of column-oriented data with a defined schema.
  • RecordBatchIterator
    Generic implementation of RecordBatchReader that wraps an iterator.
  • RecordBatchOptions
    Options that control the behaviour used when creating a RecordBatch.
  • Scalar
    A wrapper around a single value Array that implements Datum and indicates compute kernels should treat this array as a scalar value (a single value).

Traits§

  • ArrowNativeTypeOp
    Trait for ArrowNativeType that adds checked and unchecked arithmetic operations, and totally ordered comparison operations
  • ArrowNumericType
    A subtype of primitive type that represents numeric values.
  • Datum
    A possibly Scalar Array
  • RecordBatchReader
    Trait for types that can read RecordBatch’s.
  • RecordBatchWriter
    Trait for types that can write RecordBatch’s.

Results

Settings
Help

Query parser error: "Unexpected - (did you mean ->?)".