Expand description
§Delta Kernel
Delta-kernel-rs is an experimental Delta implementation
focused on interoperability with a wide range of query engines. It supports reads and
(experimental) writes (only blind appends in the write path currently). This library defines a
number of traits which must be implemented to provide a working delta implementation. They are
detailed below. There is a provided “default engine” that implements all these traits and can
be used to ease integration work. See DefaultEngine
for more
information.
A full rust
example for reading table data using the default engine can be found in the
read-table-single-threaded example (and for a more complex multi-threaded reader see the
read-table-multi-threaded example). An example for reading the table changes for a table
using the default engine can be found in the read-table-changes example.
Simple write examples can be found in the write.rs
integration tests. Standalone write
examples are coming soon!
§Engine traits
The Engine
trait allow connectors to bring their own implementation of functionality such
as reading parquet files, listing files in a file system, parsing a JSON string etc. This
trait exposes methods to get sub-engines which expose the core functionalities customizable by
connectors.
§Expression handling
Expression handling is done via the EvaluationHandler
, which in turn allows the creation of
ExpressionEvaluator
s. These evaluators are created for a specific predicate Expression
and allow evaluation of that predicate for a specific batches of data.
§File system interactions
Delta Kernel needs to perform some basic operations against file systems like listing and
reading files. These interactions are encapsulated in the StorageHandler
trait.
Implementers must take care that all assumptions on the behavior if the functions - like sorted
results - are respected.
§Reading log and data files
Delta Kernel requires the capability to read and write json files and read parquet files, which
is exposed via the JsonHandler
and ParquetHandler
respectively. When reading files,
connectors are asked to provide the context information it requires to execute the actual
operation. This is done by invoking methods on the StorageHandler
trait.
Re-exports§
pub use engine_data::EngineData;
pub use engine_data::RowVisitor;
pub use error::DeltaResult;
pub use error::Error;
pub use expressions::Expression;
pub use expressions::ExpressionRef;
pub use expressions::Predicate;
pub use expressions::PredicateRef;
pub use table::Table;
pub use delta_kernel_derive;
pub use arrow_55 as arrow;
pub use object_store_55 as object_store;
pub use parquet_55 as parquet;
Modules§
- actions
- Provides parsing and manipulation of the various actions defined in the Delta specification
- checkpoint
- This module implements the API for writing single-file checkpoints.
- engine
default-engine
orarrow-conversion
- Provides engine implementation that implement the required traits. The engine can optionally
be built into the kernel by setting the
default-engine
feature flag. See the related module for more information. - engine_
data - Traits that engines need to implement in order to pass data between themselves and kernel.
- error
- Definitions of errors that the delta kernel can encounter
- expressions
- Definitions and functions to create and manipulate kernel expressions
- history_
manager internal-api
- kernel_
predicates - Support for kernel-driven predicate evaluation via the
KernelPredicateEvaluator
trait. Various trait implementations are used for partition pruning, stats-based data skipping, and parquet row group filtering. The evaluation is normally performed overScalar
values, but data skipping “evaluation” actually produces a transformed predicate that replaces column references with stats column references, which log replay will instruct the engine to evaluate. - log_
replay internal-api
- This module provides log replay utilities.
- log_
segment internal-api
- Represents a segment of a delta log.
LogSegment
wraps a set of checkpoint and commit files. - path
internal-api
- Utilities to make working with directory and file paths easier
- scan
- Functionality to create and execute scans (reads) over data stored in a delta table
- schema
- Definitions and functions to create and manipulate kernel schema
- snapshot
- In-memory representation of snapshots of tables (snapshot is a table at given point in time, it has schema etc.)
- table
- In-memory representation of a Delta table, which acts as an immutable root entity for reading the different versions
- table_
changes - Provides an API to read the table’s change data feed between two versions.
- table_
configuration - This module defines
TableConfiguration
, a high level api to check feature support and feature enablement for a table at a given version. This encapsulatesProtocol
,Metadata
,Schema
,TableProperties
, andColumnMappingMode
. These structs in isolation should be considered raw and unvalidated if they are not a part ofTableConfiguration
. We unify these fields because they are deeply intertwined when dealing with table features. For example: To check that deletion vector writes are enabled, you must check both both the protocol’s reader/writer features, and ensure that the deletion vector table property is enabled in theTableProperties
. - table_
features - table_
properties - Delta Table properties. Note this module implements per-table configuration which governs how table-level capabilities/properties are configured (turned on/off etc.). This is orthogonal to protocol-level ‘table features’ which enable or disable reader/writer features (which then usually must be enabled/configured by table properties).
- transaction
Structs§
- File
Meta - The metadata that describes an object.
Traits§
- AsAny
- Extension trait that makes it easier to work with traits objects that implement
Any
, implemented automatically for any type that satisfiesAny
,Send
, andSync
. In particular, given sometrait T: Any + Send + Sync
, it allows upcastingT
todyn Any + Send + Sync
, which in turn allows downcasting the result to a concrete type. For example: - Engine
- The
Engine
trait encapsulates all the functionality an engine or connector needs to provide to the Delta Kernel in order to read the Delta table. - Evaluation
Handler - Provides expression evaluation capability to Delta Kernel.
- Expression
Evaluator - Trait for implementing an Expression evaluator.
- Json
Handler - Provides JSON handling functionality to Delta Kernel.
- Parquet
Handler - Provides Parquet file related functionalities to Delta Kernel.
- Predicate
Evaluator - Trait for implementing a Predicate evaluator.
- Storage
Handler - Provides file system related functionalities to Delta Kernel.
Type Aliases§
- File
Data Read Result - Data read from a Delta table file and the corresponding scan file information.
- File
Data Read Result Iterator - An iterator of data read from specified files
- File
Index - File
Size - File
Slice - A specification for a range of bytes to read from a file location
- Version
- Delta table version is 8 byte unsigned int