DataFusion OLAP backend for the Rhei HTAP engine.
Architecture position
rhei-datafusion is the default OLAP backend in Rhei (enabled by the
datafusion-backend workspace feature, which is on by default). It
implements [rhei_core::OlapEngine] for Apache DataFusion 53 and is used
by the rhei facade's OlapBackend enum.
Storage modes
The engine is parameterised by a [StorageMode] chosen at construction time:
| Variant | Durability | Notes |
|---|---|---|
[StorageMode::InMemory] |
None — lost on shutdown | Fastest; ideal for tests and ephemeral workloads |
[StorageMode::ArrowIpc] |
Durable — .arrow files |
Zero serialisation overhead; no compression |
[StorageMode::Parquet] |
Durable — .parquet files |
Columnar compression + predicate pushdown; slower writes |
StorageMode::S3Parquet |
Cloud (Amazon S3) | Available with the cloud-storage feature only |
StorageMode::GcsParquet |
Cloud (Google Cloud Storage) | Available with the cloud-storage feature only |
Streaming queries
[rhei_core::OlapEngine::query_stream] on [DataFusionEngine] returns a [rhei_core::RecordBatchBoxStream]
backed by DataFusion's own DataFrame::execute_stream(). A thin
StreamAdapter newtype adapts SendableRecordBatchStream to the common
RecordBatchBoxStream type without buffering — results flow row-batch by
row-batch directly to the caller. This is a key advantage over the DuckDB
backend, which buffers the full result set before streaming.
No transactions
DataFusion does not support SQL transactions.
[rhei_core::OlapEngine::supports_transactions] returns false for this
backend. The Rhei sync engine handles partial-failure recovery via CDC
sequence numbers instead of relying on BEGIN/COMMIT.
DML strategy
DataFusion's MemTable and ListingTable are read-only. DML
(INSERT / UPDATE / DELETE) is implemented by mutating the engine's own
table store and re-registering tables with the SessionContext. SQL
statements are parsed with sqlparser-rs (SQLite dialect) so that the sync
engine's generated SQL is handled correctly.
Cloud storage
S3 and GCS backends are gated behind the cloud-storage workspace feature.
When enabled, object_store (with aws and gcp sub-features) is pulled
in. Credentials are resolved from the environment at engine construction
time following object_store conventions (e.g. AWS_ACCESS_KEY_ID /
GOOGLE_APPLICATION_CREDENTIALS).
Feature flags
cloud-storage— enablesStorageMode::S3ParquetandStorageMode::GcsParquetvia theobject_storecrate.