Skip to main content

Crate rhei_datafusion

Crate rhei_datafusion 

Source
Expand description

DataFusion OLAP backend for the Rhei HTAP engine.

§Architecture position

rhei-datafusion is the default OLAP backend in Rhei (enabled by the datafusion-backend workspace feature, which is on by default). It implements rhei_core::OlapEngine for Apache DataFusion 53 and is used by the rhei facade’s OlapBackend enum.

§Storage modes

The engine is parameterised by a StorageMode chosen at construction time:

VariantDurabilityNotes
StorageMode::InMemoryNone — lost on shutdownFastest; ideal for tests and ephemeral workloads
StorageMode::VortexDurable — .vortex files (local or S3)Auto-detects local vs S3 from URL scheme

§Streaming queries

rhei_core::OlapEngine::query_stream on DataFusionEngine returns a rhei_core::RecordBatchBoxStream backed by DataFusion’s own DataFrame::execute_stream(). A thin StreamAdapter newtype adapts SendableRecordBatchStream to the common RecordBatchBoxStream type without buffering — results flow row-batch by row-batch directly to the caller.

§No transactions

DataFusion does not support SQL transactions. rhei_core::OlapEngine::supports_transactions returns false for this backend. The Rhei sync engine handles partial-failure recovery via CDC sequence numbers instead of relying on BEGIN/COMMIT.

§DML strategy

InMemory: mutations update an in-memory HashMap of Vec<RecordBatch> and re-register a MemTable with DataFusion after each change.

Vortex: INSERT routes through DataFusion’s VortexFormatFactory sink (SQL INSERT INTO … SELECT * FROM tmp). UPDATE/DELETE use a read-modify-write cycle: read all data via SELECT *, apply mutations in-memory, clear the table directory, and re-insert.

§Cloud storage

S3-compatible backends are gated behind the cloud-storage workspace feature. When enabled, object_store (with aws sub-feature) is pulled in. Credentials are resolved from the environment at engine construction time following object_store conventions (e.g. AWS_ACCESS_KEY_ID). S3-compatible services (MinIO, Cloudflare R2, Wasabi, Ceph RGW) work via AWS_ENDPOINT_URL.

§Feature flags

  • cloud-storage — enables StorageMode::Vortex with s3:// URLs via the object_store crate.

Re-exports§

pub use engine::DataFusionEngine;
pub use engine::SharedDataFusionEngine;
pub use error::DfOlapError;
pub use storage::StorageMode;

Modules§

engine
DataFusion-backed OLAP engine.
error
Error types for the DataFusion OLAP backend.
storage
Pluggable storage modes for the DataFusion OLAP engine.