Expand description
Welcome to arrow2’s documentation. Thanks for checking it out!
This is a library for efficient in-memory data operations with
Arrow in-memory format.
It is a re-write from the bottom up of the official arrow crate with soundness
and type safety in mind.
Check out the guide for an introduction. Below is an example of some of the things you can do with it:
use std::sync::Arc;
use arrow2::array::*;
use arrow2::datatypes::{Field, DataType, Schema};
use arrow2::compute::arithmetics;
use arrow2::error::Result;
use arrow2::io::parquet::write::*;
use arrow2::chunk::Chunk;
fn main() -> Result<()> {
    // declare arrays
    let a = Int32Array::from(&[Some(1), None, Some(3)]);
    let b = Int32Array::from(&[Some(2), None, Some(6)]);
    // compute (probably the fastest implementation of a nullable op you can find out there)
    let c = arithmetics::basic::mul_scalar(&a, &2);
    assert_eq!(c, b);
    // declare a schema with fields
    let schema = Schema::from(vec![
        Field::new("c1", DataType::Int32, true),
        Field::new("c2", DataType::Int32, true),
    ]);
    // declare chunk
    let chunk = Chunk::new(vec![a.arced(), b.arced()]);
    // write to parquet (probably the fastest implementation of writing to parquet out there)
    let options = WriteOptions {
        write_statistics: true,
        compression: CompressionOptions::Snappy,
        version: Version::V1,
        data_pagesize_limit: None,
    };
    let row_groups = RowGroupIterator::try_new(
        vec![Ok(chunk)].into_iter(),
        &schema,
        options,
        vec![vec![Encoding::Plain], vec![Encoding::Plain]],
    )?;
    // anything implementing `std::io::Write` works
    let mut file = vec![];
    let mut writer = FileWriter::try_new(file, schema, options)?;
    // Write the file.
    for group in row_groups {
        writer.write(group?)?;
    }
    let _ = writer.end(None)?;
    Ok(())
}§Cargo features
This crate has a significant number of cargo features to reduce compilation
time and number of dependencies. The feature "full" activates most
functionality, such as:
- io_ipc: to interact with the Arrow IPC format
- io_ipc_compression: to read and write compressed Arrow IPC (v2)
- io_csvto read and write CSV
- io_jsonto read and write JSON
- io_flightto read and write to Arrow’s Flight protocol
- io_parquetto read and write parquet
- io_parquet_compressionto read and write compressed parquet
- io_printto write batches to formatted ASCII tables
- computeto operate on arrays (addition, sum, sort, etc.)
The feature simd (not part of full) produces more explicit SIMD instructions
via std::simd, but requires the
nightly channel.
Modules§
- array
- Contains the ArrayandMutableArraytrait objects declaring arrays, as well as concrete arrays (such asUtf8ArrayandMutableUtf8Array).
- bitmap
- contains BitmapandMutableBitmap, containers ofbool.
- buffer
- Contains Buffer, an immutable container for all Arrow physical types (e.g. i32, f64).
- chunk
- Contains Chunk, a container ofArraywhere every array has the same length.
- compute
- contains a wide range of compute operations (e.g.
arithmetics,aggregate,filter,comparison, andsort)
- datatypes
- Contains all metadata, such as PhysicalType,DataType,FieldandSchema.
- error
- Defines Error, representing all errors returned by this crate.
- ffi
- contains FFI bindings to import and export Arrayvia Arrow’s C Data Interface
- io
- Contains modules to interface with other formats such as csv,parquet,json,ipc,printandavro.
- mmapio_ipc
- Memory maps regions defined on the IPC format into Array.
- offset
- Contains the declaration of Offset
- scalar
- contains the Scalartrait object representing individual items ofArrays, as well as concrete implementations such asBooleanScalar.
- temporal_conversions 
- Conversion methods for dates and times.
- trusted_len 
- Declares TrustedLen.
- types
- Sealed traits and implementations to handle all physical types used in this crate.
- util
- Misc utilities used in different places in the crate.
Macros§
- with_match_ primitive_ without_ interval_ type 
- Match PrimitiveTypeto standard Rust types
Structs§
- AHashMap 
- A HashMapusingRandomStateto hash the items. (Requires thestdfeature to be enabled.)
Enums§
- Either
- The enum Eitherwith variantsLeftandRightis a general purpose sum type with two cases.