Expand description
§Delta Kernel
Delta-kernel-rs is an experimental Delta implementation
focused on interoperability with a wide range of query engines. It supports reads and
(experimental) writes (only blind appends in the write path currently). This library defines a
number of traits which must be implemented to provide a working delta implementation. They are
detailed below. There is a provided “default engine” that implements all these traits and can
be used to ease integration work. See DefaultEngine
for more
information.
A full rust
example for reading table data using the default engine can be found in the
read-table-single-threaded example (and for a more complex multi-threaded reader see the
read-table-multi-threaded example). An example for reading the table changes for a table
using the default engine can be found in the read-table-changes example.
Simple write examples can be found in the write.rs
integration tests. Standalone write
examples are coming soon!
§Engine traits
The Engine
trait allow connectors to bring their own implementation of functionality such
as reading parquet files, listing files in a file system, parsing a JSON string etc. This
trait exposes methods to get sub-engines which expose the core functionalities customizable by
connectors.
§Expression handling
Expression handling is done via the ExpressionHandler
, which in turn allows the creation of
ExpressionEvaluator
s. These evaluators are created for a specific predicate Expression
and allow evaluation of that predicate for a specific batches of data.
§File system interactions
Delta Kernel needs to perform some basic operations against file systems like listing and
reading files. These interactions are encapsulated in the FileSystemClient
trait.
Implementors must take care that all assumptions on the behavior if the functions - like sorted
results - are respected.
§Reading log and data files
Delta Kernel requires the capability to read and write json files and read parquet files, which
is exposed via the JsonHandler
and ParquetHandler
respectively. When reading files,
connectors are asked to provide the context information it requires to execute the actual
operation. This is done by invoking methods on the FileSystemClient
trait.
Re-exports§
pub use engine_data::EngineData;
pub use engine_data::RowVisitor;
pub use error::DeltaResult;
pub use error::Error;
pub use expressions::Expression;
pub use expressions::ExpressionRef;
pub use table::Table;
pub use delta_kernel_derive;
Modules§
- actions
- Provides parsing and manipulation of the various actions defined in the Delta specification
- arrow
- This module exists to help re-export the version of arrow used by default-engine and other parts of kernel that need arrow
- engine
default-engine
orsync-engine
orarrow-conversion
- Provides engine implementation that implement the required traits. These engines can optionally
be built into the kernel by setting the
default-engine
orsync-engine
feature flags. See the related modules for more information. - engine_
data - Traits that engines need to implement in order to pass data between themselves and kernel.
- error
- Definitions of errors that the delta kernel can encounter
- expressions
- Definitions and functions to create and manipulate kernel expressions
- log_
segment developer-visibility
- Represents a segment of a delta log.
LogSegment
wraps a set of checkpoint and commit files. - parquet
- This module exists to help re-export the version of arrow used by default-engine and other parts of kernel that need arrow
- path
developer-visibility
- Utilities to make working with directory and file paths easier
- scan
- Functionality to create and execute scans (reads) over data stored in a delta table
- schema
- Definitions and functions to create and manipulate kernel schema
- snapshot
- In-memory representation of snapshots of tables (snapshot is a table at given point in time, it has schema etc.)
- table
- In-memory representation of a Delta table, which acts as an immutable root entity for reading the different versions
- table_
changes - Provides an API to read the table’s change data feed between two versions.
- table_
configuration - This module defines
TableConfiguration
, a high level api to check feature support and feature enablement for a table at a given version. This encapsulatesProtocol
,Metadata
,Schema
,TableProperties
, andColumnMappingMode
. These structs in isolation should be considered raw and unvalidated if they are not a part ofTableConfiguration
. We unify these fields because they are deeply intertwined when dealing with table features. For example: To check that deletion vector writes are enabled, you must check both both the protocol’s reader/writer features, and ensure that the deletion vector table property is enabled in theTableProperties
. - table_
features - table_
properties - Delta Table properties. Note this module implements per-table configuration which governs how table-level capabilities/properties are configured (turned on/off etc.). This is orthogonal to protocol-level ‘table features’ which enable or disable reader/writer features (which then usually must be enabled/configured by table properties).
- transaction
Structs§
- File
Meta - The metadata that describes an object.
Traits§
- AsAny
- Extension trait that makes it easier to work with traits objects that implement
Any
, implemented automatically for any type that satisfiesAny
,Send
, andSync
. In particular, given sometrait T: Any + Send + Sync
, it allows upcastingT
todyn Any + Send + Sync
, which in turn allows downcasting the result to a concrete type. For example: - Engine
- The
Engine
trait encapsulates all the functionality an engine or connector needs to provide to the Delta Kernel in order to read the Delta table. - Expression
Evaluator - Trait for implementing an Expression evaluator.
- Expression
Handler - Provides expression evaluation capability to Delta Kernel.
- File
System Client - Provides file system related functionalities to Delta Kernel.
- Json
Handler - Provides JSON handling functionality to Delta Kernel.
- Parquet
Handler - Provides Parquet file related functionalities to Delta Kernel.
Type Aliases§
- File
Data Read Result - Data read from a Delta table file and the corresponding scan file information.
- File
Data Read Result Iterator - An iterator of data read from specified files
- File
Slice - A specification for a range of bytes to read from a file location
- Version
- Delta table version is 8 byte unsigned int