rhei-datafusion 1.5.0

DataFusion OLAP backend for Rhei HTAP engine
Documentation

DataFusion OLAP backend for the Rhei HTAP engine.

Architecture position

rhei-datafusion is the default OLAP backend in Rhei (enabled by the datafusion-backend workspace feature, which is on by default). It implements [rhei_core::OlapEngine] for Apache DataFusion 53 and is used by the rhei facade's OlapBackend enum.

Storage modes

The engine is parameterised by a [StorageMode] chosen at construction time:

Variant Durability Notes
[StorageMode::InMemory] None — lost on shutdown Fastest; ideal for tests and ephemeral workloads
[StorageMode::ArrowIpc] Durable — .arrow files Zero serialisation overhead; no compression
[StorageMode::Parquet] Durable — .parquet files Columnar compression + predicate pushdown; slower writes
StorageMode::S3Parquet Cloud (Amazon S3) Available with the cloud-storage feature only
StorageMode::GcsParquet Cloud (Google Cloud Storage) Available with the cloud-storage feature only

Streaming queries

[rhei_core::OlapEngine::query_stream] on [DataFusionEngine] returns a [rhei_core::RecordBatchBoxStream] backed by DataFusion's own DataFrame::execute_stream(). A thin StreamAdapter newtype adapts SendableRecordBatchStream to the common RecordBatchBoxStream type without buffering — results flow row-batch by row-batch directly to the caller. This is a key advantage over the DuckDB backend, which buffers the full result set before streaming.

No transactions

DataFusion does not support SQL transactions. [rhei_core::OlapEngine::supports_transactions] returns false for this backend. The Rhei sync engine handles partial-failure recovery via CDC sequence numbers instead of relying on BEGIN/COMMIT.

DML strategy

DataFusion's MemTable and ListingTable are read-only. DML (INSERT / UPDATE / DELETE) is implemented by mutating the engine's own table store and re-registering tables with the SessionContext. SQL statements are parsed with sqlparser-rs (SQLite dialect) so that the sync engine's generated SQL is handled correctly.

Cloud storage

S3 and GCS backends are gated behind the cloud-storage workspace feature. When enabled, object_store (with aws and gcp sub-features) is pulled in. Credentials are resolved from the environment at engine construction time following object_store conventions (e.g. AWS_ACCESS_KEY_ID / GOOGLE_APPLICATION_CREDENTIALS).

Feature flags

  • cloud-storage — enables StorageMode::S3Parquet and StorageMode::GcsParquet via the object_store crate.