Expand description
DataFusion OLAP backend for the Rhei HTAP engine.
§Architecture position
rhei-datafusion is the default OLAP backend in Rhei (enabled by the
datafusion-backend workspace feature, which is on by default). It
implements rhei_core::OlapEngine for Apache DataFusion 53 and is used
by the rhei facade’s OlapBackend enum.
§Storage modes
The engine is parameterised by a StorageMode chosen at construction time:
| Variant | Durability | Notes |
|---|---|---|
StorageMode::InMemory | None — lost on shutdown | Fastest; ideal for tests and ephemeral workloads |
StorageMode::ArrowIpc | Durable — .arrow files | Zero serialisation overhead; no compression |
StorageMode::Parquet | Durable — .parquet files | Columnar compression + predicate pushdown; slower writes |
StorageMode::S3Parquet | Cloud (Amazon S3) | Available with the cloud-storage feature only |
StorageMode::GcsParquet | Cloud (Google Cloud Storage) | Available with the cloud-storage feature only |
§Streaming queries
rhei_core::OlapEngine::query_stream on DataFusionEngine returns a rhei_core::RecordBatchBoxStream
backed by DataFusion’s own DataFrame::execute_stream(). A thin
StreamAdapter newtype adapts SendableRecordBatchStream to the common
RecordBatchBoxStream type without buffering — results flow row-batch by
row-batch directly to the caller. This is a key advantage over the DuckDB
backend, which buffers the full result set before streaming.
§No transactions
DataFusion does not support SQL transactions.
rhei_core::OlapEngine::supports_transactions returns false for this
backend. The Rhei sync engine handles partial-failure recovery via CDC
sequence numbers instead of relying on BEGIN/COMMIT.
§DML strategy
DataFusion’s MemTable and ListingTable are read-only. DML
(INSERT / UPDATE / DELETE) is implemented by mutating the engine’s own
table store and re-registering tables with the SessionContext. SQL
statements are parsed with sqlparser-rs (SQLite dialect) so that the sync
engine’s generated SQL is handled correctly.
§Cloud storage
S3 and GCS backends are gated behind the cloud-storage workspace feature.
When enabled, object_store (with aws and gcp sub-features) is pulled
in. Credentials are resolved from the environment at engine construction
time following object_store conventions (e.g. AWS_ACCESS_KEY_ID /
GOOGLE_APPLICATION_CREDENTIALS).
§Feature flags
cloud-storage— enablesStorageMode::S3ParquetandStorageMode::GcsParquetvia theobject_storecrate.
Re-exports§
pub use engine::DataFusionEngine;pub use error::DfOlapError;pub use storage::StorageMode;