Crate re_arrow2

Source
Expand description

Welcome to arrow2’s documentation. Thanks for checking it out!

This is a library for efficient in-memory data operations with Arrow in-memory format. It is a re-write from the bottom up of the official arrow crate with soundness and type safety in mind.

Check out the guide for an introduction. Below is an example of some of the things you can do with it:

use std::sync::Arc;

use re_arrow2::array::*;
use re_arrow2::datatypes::{Field, DataType, Schema};
use re_arrow2::compute::arithmetics;
use re_arrow2::error::Result;
// use re_arrow2::io::parquet::write::*;
use re_arrow2::chunk::Chunk;

fn main() -> Result<()> {
    // declare arrays
    let a = Int32Array::from(&[Some(1), None, Some(3)]);
    let b = Int32Array::from(&[Some(2), None, Some(6)]);

    // compute (probably the fastest implementation of a nullable op you can find out there)
    let c = arithmetics::basic::mul_scalar(&a, &2);
    assert_eq!(c, b);

    // declare a schema with fields
    let schema = Schema::from(vec![
        Field::new("c1", DataType::Int32, true),
        Field::new("c2", DataType::Int32, true),
    ]);

    // declare chunk
    let chunk = Chunk::new(vec![a.arced(), b.arced()]);

    // // write to parquet (probably the fastest implementation of writing to parquet out there)

    // let options = WriteOptions {
    //     write_statistics: true,
    //     compression: CompressionOptions::Snappy,
    //     version: Version::V1,
    //     data_pagesize_limit: None,
    // };

    // let row_groups = RowGroupIterator::try_new(
    //     vec![Ok(chunk)].into_iter(),
    //     &schema,
    //     options,
    //     vec![vec![Encoding::Plain], vec![Encoding::Plain]],
    // )?;

    // // anything implementing `std::io::Write` works
    // let mut file = vec![];

    // let mut writer = FileWriter::try_new(file, schema, options)?;

    // // Write the file.
    // for group in row_groups {
    //     writer.write(group?)?;
    // }
    // let _ = writer.end(None)?;

    Ok(())
}

§Cargo features

This crate has a significant number of cargo features to reduce compilation time and number of dependencies. The feature "full" activates most functionality, such as:

  • io_ipc: to interact with the Arrow IPC format
  • io_ipc_compression: to read and write compressed Arrow IPC (v2)
  • io_csv to read and write CSV
  • io_json to read and write JSON
  • io_flight to read and write to Arrow’s Flight protocol
  • io_parquet to read and write parquet
  • io_parquet_compression to read and write compressed parquet
  • io_print to write batches to formatted ASCII tables
  • compute to operate on arrays (addition, sum, sort, etc.)

The feature simd (not part of full) produces more explicit SIMD instructions via std::simd, but requires the nightly channel.

Modules§

array
Contains the Array and MutableArray trait objects declaring arrays, as well as concrete arrays (such as Utf8Array and MutableUtf8Array).
bitmap
contains Bitmap and MutableBitmap, containers of bool.
buffer
Contains Buffer, an immutable container for all Arrow physical types (e.g. i32, f64).
chunk
Contains Chunk, a container of Array where every array has the same length.
compute
contains a wide range of compute operations (e.g. arithmetics, aggregate, filter, comparison, and sort)
datatypes
Contains all metadata, such as PhysicalType, DataType, Field and Schema.
error
Defines Error, representing all errors returned by this crate.
ffi
contains FFI bindings to import and export Array via Arrow’s C Data Interface
io
Contains modules to interface with other formats such as csv, [parquet], json, ipc, print and avro.
mmapio_ipc
Memory maps regions defined on the IPC format into Array.
offset
Contains the declaration of Offset
scalar
contains the Scalar trait object representing individual items of Arrays, as well as concrete implementations such as BooleanScalar.
temporal_conversions
Conversion methods for dates and times.
trusted_len
Declares TrustedLen.
types
Sealed traits and implementations to handle all physical types used in this crate.
util
Misc utilities used in different places in the crate.

Macros§

with_match_primitive_without_interval_typecompute_sort
Match PrimitiveType to standard Rust types

Structs§

AHashMap
A HashMap using RandomState to hash the items. (Requires the std feature to be enabled.)

Enums§

Either
The enum Either with variants Left and Right is a general purpose sum type with two cases.