rhei-datafusion 1.5.0

DataFusion OLAP backend for Rhei HTAP engine
Documentation
//! DataFusion OLAP backend for the Rhei HTAP engine.
//!
//! # Architecture position
//!
//! `rhei-datafusion` is the **default OLAP backend** in Rhei (enabled by the
//! `datafusion-backend` workspace feature, which is on by default).  It
//! implements [`rhei_core::OlapEngine`] for Apache DataFusion 53 and is used
//! by the [`rhei`](https://docs.rs/rhei) facade's `OlapBackend` enum.
//!
//! # Storage modes
//!
//! The engine is parameterised by a [`StorageMode`] chosen at construction time:
//!
//! | Variant | Durability | Notes |
//! |---------|------------|-------|
//! | [`StorageMode::InMemory`] | None — lost on shutdown | Fastest; ideal for tests and ephemeral workloads |
//! | [`StorageMode::ArrowIpc`] | Durable — `.arrow` files | Zero serialisation overhead; no compression |
//! | [`StorageMode::Parquet`] | Durable — `.parquet` files | Columnar compression + predicate pushdown; slower writes |
//! | `StorageMode::S3Parquet` | Cloud (Amazon S3) | Available with the `cloud-storage` feature only |
//! | `StorageMode::GcsParquet` | Cloud (Google Cloud Storage) | Available with the `cloud-storage` feature only |
//!
//! # Streaming queries
//!
//! [`rhei_core::OlapEngine::query_stream`] on [`DataFusionEngine`] returns a [`rhei_core::RecordBatchBoxStream`]
//! backed by DataFusion's own `DataFrame::execute_stream()`.  A thin
//! `StreamAdapter` newtype adapts `SendableRecordBatchStream` to the common
//! `RecordBatchBoxStream` type without buffering — results flow row-batch by
//! row-batch directly to the caller.  This is a key advantage over the DuckDB
//! backend, which buffers the full result set before streaming.
//!
//! # No transactions
//!
//! DataFusion does not support SQL transactions.
//! [`rhei_core::OlapEngine::supports_transactions`] returns `false` for this
//! backend.  The Rhei sync engine handles partial-failure recovery via CDC
//! sequence numbers instead of relying on `BEGIN`/`COMMIT`.
//!
//! # DML strategy
//!
//! DataFusion's `MemTable` and `ListingTable` are read-only.  DML
//! (INSERT / UPDATE / DELETE) is implemented by mutating the engine's own
//! table store and re-registering tables with the `SessionContext`.  SQL
//! statements are parsed with `sqlparser-rs` (SQLite dialect) so that the sync
//! engine's generated SQL is handled correctly.
//!
//! # Cloud storage
//!
//! S3 and GCS backends are gated behind the `cloud-storage` workspace feature.
//! When enabled, `object_store` (with `aws` and `gcp` sub-features) is pulled
//! in.  Credentials are resolved from the environment at engine construction
//! time following `object_store` conventions (e.g. `AWS_ACCESS_KEY_ID` /
//! `GOOGLE_APPLICATION_CREDENTIALS`).
//!
//! # Feature flags
//!
//! - `cloud-storage` — enables `StorageMode::S3Parquet` and
//!   `StorageMode::GcsParquet` via the `object_store` crate.

pub mod engine;
pub mod error;
pub mod storage;

pub use engine::{DataFusionEngine, SharedDataFusionEngine};
pub use error::DfOlapError;
pub use storage::StorageMode;