Crate deltalake

source ·
Expand description

Native Delta Lake implementation in Rust

Usage

Load a Delta Table by path:

async {
  let table = deltalake::open_table("./tests/data/simple_table").await.unwrap();
  let files = table.get_files();
};

Load a specific version of Delta Table by path then filter files by partitions:

async {
  let table = deltalake::open_table_with_version("./tests/data/simple_table", 0).await.unwrap();
  let files = table.get_files_by_partitions(&[deltalake::PartitionFilter {
      key: "month",
      value: deltalake::PartitionValue::Equal("12"),
  }]);
};

Load a specific version of Delta Table by path and datetime:

async {
  let table = deltalake::open_table_with_ds(
      "./tests/data/simple_table",
      "2020-05-02T23:47:31-07:00",
  ).await.unwrap();
  let files = table.get_files();
};

Optional cargo package features

  • s3, gcs, azure - enable the storage backends for AWS S3, Google Cloud Storage (GCS), or Azure Blob Storage / Azure Data Lake Storage Gen2 (ADLS2). Use s3-rustls to use Rust TLS instead of native TLS implementation.
  • glue - enable the Glue data catalog to work with Delta Tables with AWS Glue.
  • datafusion-ext - enable the datafusion::datasource::TableProvider trait implementation for Delta Tables, allowing them to be queried using DataFusion.

Querying Delta Tables with Datafusion

Querying from local filesystem:

use std::sync::Arc;
use datafusion::execution::context::ExecutionContext;

async {
  let mut ctx = ExecutionContext::new();
  let table = deltalake::open_table("./tests/data/simple_table")
      .await
      .unwrap();
  ctx.register_table("demo", Arc::new(table)).unwrap();

  let batches = ctx
      .sql("SELECT * FROM demo").await.unwrap()
      .collect()
      .await.unwrap();
};

It’s important to note that the DataFusion library is evolving quickly, often with breaking api changes, and this may cause compilation issues as a result. If you are having issues with the most recently released delta-rs you can set a specific branch or commit in your Cargo.toml.

datafusion = { git = "https://github.com/apache/arrow-datafusion.git", rev = "07bc2c754805f536fe1cd873dbe6adfc0a21cbb3" }

Re-exports

pub use self::data_catalog::get_data_catalog;
pub use self::data_catalog::DataCatalog;
pub use self::data_catalog::DataCatalogError;
pub use arrow;
pub use operations::DeltaOps;
pub use parquet;
pub use self::builder::*;
pub use self::delta::*;
pub use self::partitions::*;
pub use self::schema::*;

Modules

Actions included in Delta table transaction logs
Create or load DeltaTables
Implementation for writing delta checkpoints.
Catalog abstraction for Delta Table
Delta Table read and write implementation
Conversion between Delta Table schema and Arrow schema
Delta Table configuration
High level operations API to interact with Delta tables
Optimize a Delta Table
Delta Table partition handling logic.
Delta Table schema implementation.
Object storage backend abstraction layer for Delta Table transaction logs and data
The module for delta table state.
Utility functions for converting time formats.
Vacuum a Delta table
Abstractions and implementations for writing data to delta tables

Structs

The metadata that describes an object.
A parsed path representation that can be safely written to object storage

Enums

A specialized Error for object store-related errors

Traits

Universal API to multiple object store services.