A native Rust implementation of Apache Arrow, a cross-language development platform for in-memory data.
Array in this crate has an associated
that specifies how its data is layed in memory and represented.
Thus, a central enum of this crate is
DataType, that contains the set of valid
DataTypes in the specification. For example,
The central trait of this package is the dynamically-typed
represents a fixed-sized, immutable, Send + Sync Array of nullable elements. An example of such an array is
One way to think about an arrow
Array is a
Arc<[Option<T>; len]> where T can be anything ranging from an integer to a string, or even
data_type(), and the nullability of each of its elements,
can be obtained via
is_null(index). To downcast an
Array to a specific implementation, you can use
use ; let array = from; assert_eq!; assert_eq!; assert_eq!;
To make the array dynamically typed, we wrap it in an
# use Arc; use DataType; use ; # let array = from; let array: ArrayRef = new; assert_eq!; // array.value() is not available in the dynamically-typed version assert_eq!; assert_eq!;
to downcast, use
# use Arc; # use ; # let array = from; # let array: ArrayRef = new; let array = array.as_any..unwrap; assert_eq!;
Memory and Buffers
Array is stored in
ArrayData, that in turn
is a collection of other
Buffers is the central struct that array implementations use keep allocated memory and pointers.
MutableBuffer is the mutable counter-part of
These are the lowest abstractions of this crate, and are used throughout the crate to
efficiently allocate, write, read and deallocate memory.
Field, Schema and RecordBatch
Field is a struct that contains an array's metadata (datatype and whether its values
can be null), and a name.
Schema is a vector of fields with optional metadata.
Together, they form the basis of a schematic representation of a group of
RecordBatch is a struct with a
Schema and a vector of
Arrays, all with the same
len. A record batch is the highest order struct that this crate currently offers
and is broadly used to represent a table where each column in an
This crate offers many operations (called kernels) to operate on
Arrays, that you can find at [compute::kernels].
It has both vertical and horizontal operations, and some of them have an SIMD implementation.
This crate has most of the implementation of the arrow specification. Specifically, it supports the following types:
- All arrow primitive types, such as
- All arrow variable length types, such as
- All composite types such as
- Dictionary types
This crate also implements many common vertical operations:
- all mathematical binary operators, such as
- all boolean binary operators such as
- some string operators such as
as well as some horizontal operations, such as
Finally, this crate implements some readers and writers to different formats:
The parquet implementation is on a separate crate