Crate delta_kernel

Source
Expand description

§Delta Kernel

Delta-kernel-rs is an experimental Delta implementation focused on interoperability with a wide range of query engines. It supports reads and (experimental) writes (only blind appends in the write path currently). This library defines a number of traits which must be implemented to provide a working delta implementation. They are detailed below. There is a provided “default engine” that implements all these traits and can be used to ease integration work. See DefaultEngine for more information.

A full rust example for reading table data using the default engine can be found in the read-table-single-threaded example (and for a more complex multi-threaded reader see the read-table-multi-threaded example). An example for reading the table changes for a table using the default engine can be found in the read-table-changes example.

Simple write examples can be found in the write.rs integration tests. Standalone write examples are coming soon!

§Engine traits

The Engine trait allow connectors to bring their own implementation of functionality such as reading parquet files, listing files in a file system, parsing a JSON string etc. This trait exposes methods to get sub-engines which expose the core functionalities customizable by connectors.

§Expression handling

Expression handling is done via the ExpressionHandler, which in turn allows the creation of ExpressionEvaluators. These evaluators are created for a specific predicate Expression and allow evaluation of that predicate for a specific batches of data.

§File system interactions

Delta Kernel needs to perform some basic operations against file systems like listing and reading files. These interactions are encapsulated in the FileSystemClient trait. Implementors must take care that all assumptions on the behavior if the functions - like sorted results - are respected.

§Reading log and data files

Delta Kernel requires the capability to read and write json files and read parquet files, which is exposed via the JsonHandler and ParquetHandler respectively. When reading files, connectors are asked to provide the context information it requires to execute the actual operation. This is done by invoking methods on the FileSystemClient trait.

Re-exports§

pub use engine_data::EngineData;
pub use engine_data::RowVisitor;
pub use error::DeltaResult;
pub use error::Error;
pub use expressions::Expression;
pub use expressions::ExpressionRef;
pub use table::Table;
pub use delta_kernel_derive;

Modules§

actions
Provides parsing and manipulation of the various actions defined in the Delta specification
arrow
This module exists to help re-export the version of arrow used by default-engine and other parts of kernel that need arrow
enginedefault-engine or sync-engine or arrow-conversion
Provides engine implementation that implement the required traits. These engines can optionally be built into the kernel by setting the default-engine or sync-engine feature flags. See the related modules for more information.
engine_data
Traits that engines need to implement in order to pass data between themselves and kernel.
error
Definitions of errors that the delta kernel can encounter
expressions
Definitions and functions to create and manipulate kernel expressions
log_segmentdeveloper-visibility
Represents a segment of a delta log. LogSegment wraps a set of checkpoint and commit files.
parquet
This module exists to help re-export the version of arrow used by default-engine and other parts of kernel that need arrow
pathdeveloper-visibility
Utilities to make working with directory and file paths easier
scan
Functionality to create and execute scans (reads) over data stored in a delta table
schema
Definitions and functions to create and manipulate kernel schema
snapshot
In-memory representation of snapshots of tables (snapshot is a table at given point in time, it has schema etc.)
table
In-memory representation of a Delta table, which acts as an immutable root entity for reading the different versions
table_changes
Provides an API to read the table’s change data feed between two versions.
table_configuration
This module defines TableConfiguration, a high level api to check feature support and feature enablement for a table at a given version. This encapsulates Protocol, Metadata, Schema, TableProperties, and ColumnMappingMode. These structs in isolation should be considered raw and unvalidated if they are not a part of TableConfiguration. We unify these fields because they are deeply intertwined when dealing with table features. For example: To check that deletion vector writes are enabled, you must check both both the protocol’s reader/writer features, and ensure that the deletion vector table property is enabled in the TableProperties.
table_features
table_properties
Delta Table properties. Note this module implements per-table configuration which governs how table-level capabilities/properties are configured (turned on/off etc.). This is orthogonal to protocol-level ‘table features’ which enable or disable reader/writer features (which then usually must be enabled/configured by table properties).
transaction

Structs§

FileMeta
The metadata that describes an object.

Traits§

AsAny
Extension trait that makes it easier to work with traits objects that implement Any, implemented automatically for any type that satisfies Any, Send, and Sync. In particular, given some trait T: Any + Send + Sync, it allows upcasting T to dyn Any + Send + Sync, which in turn allows downcasting the result to a concrete type. For example:
Engine
The Engine trait encapsulates all the functionality an engine or connector needs to provide to the Delta Kernel in order to read the Delta table.
ExpressionEvaluator
Trait for implementing an Expression evaluator.
ExpressionHandler
Provides expression evaluation capability to Delta Kernel.
FileSystemClient
Provides file system related functionalities to Delta Kernel.
JsonHandler
Provides JSON handling functionality to Delta Kernel.
ParquetHandler
Provides Parquet file related functionalities to Delta Kernel.

Type Aliases§

FileDataReadResult
Data read from a Delta table file and the corresponding scan file information.
FileDataReadResultIterator
An iterator of data read from specified files
FileSlice
A specification for a range of bytes to read from a file location
Version
Delta table version is 8 byte unsigned int