Skip to main content

Crate rhei_datafusion

Crate rhei_datafusion 

Source
Expand description

DataFusion OLAP backend for the Rhei HTAP engine.

§Architecture position

rhei-datafusion is the default OLAP backend in Rhei (enabled by the datafusion-backend workspace feature, which is on by default). It implements rhei_core::OlapEngine for Apache DataFusion 53 and is used by the rhei facade’s OlapBackend enum.

§Storage modes

The engine is parameterised by a StorageMode chosen at construction time:

VariantDurabilityNotes
StorageMode::InMemoryNone — lost on shutdownFastest; ideal for tests and ephemeral workloads
StorageMode::ArrowIpcDurable — .arrow filesZero serialisation overhead; no compression
StorageMode::ParquetDurable — .parquet filesColumnar compression + predicate pushdown; slower writes
StorageMode::S3ParquetCloud (Amazon S3)Available with the cloud-storage feature only
StorageMode::GcsParquetCloud (Google Cloud Storage)Available with the cloud-storage feature only

§Streaming queries

rhei_core::OlapEngine::query_stream on DataFusionEngine returns a rhei_core::RecordBatchBoxStream backed by DataFusion’s own DataFrame::execute_stream(). A thin StreamAdapter newtype adapts SendableRecordBatchStream to the common RecordBatchBoxStream type without buffering — results flow row-batch by row-batch directly to the caller. This is a key advantage over the DuckDB backend, which buffers the full result set before streaming.

§No transactions

DataFusion does not support SQL transactions. rhei_core::OlapEngine::supports_transactions returns false for this backend. The Rhei sync engine handles partial-failure recovery via CDC sequence numbers instead of relying on BEGIN/COMMIT.

§DML strategy

DataFusion’s MemTable and ListingTable are read-only. DML (INSERT / UPDATE / DELETE) is implemented by mutating the engine’s own table store and re-registering tables with the SessionContext. SQL statements are parsed with sqlparser-rs (SQLite dialect) so that the sync engine’s generated SQL is handled correctly.

§Cloud storage

S3 and GCS backends are gated behind the cloud-storage workspace feature. When enabled, object_store (with aws and gcp sub-features) is pulled in. Credentials are resolved from the environment at engine construction time following object_store conventions (e.g. AWS_ACCESS_KEY_ID / GOOGLE_APPLICATION_CREDENTIALS).

§Feature flags

  • cloud-storage — enables StorageMode::S3Parquet and StorageMode::GcsParquet via the object_store crate.

Re-exports§

pub use engine::DataFusionEngine;
pub use engine::SharedDataFusionEngine;
pub use error::DfOlapError;
pub use storage::StorageMode;

Modules§

engine
DataFusion-backed OLAP engine.
error
Error types for the DataFusion OLAP backend.
storage
Pluggable storage modes for the DataFusion OLAP engine.