rhei_datafusion/lib.rs
1//! DataFusion OLAP backend for the Rhei HTAP engine.
2//!
3//! # Architecture position
4//!
5//! `rhei-datafusion` is the **default OLAP backend** in Rhei (enabled by the
6//! `datafusion-backend` workspace feature, which is on by default). It
7//! implements [`rhei_core::OlapEngine`] for Apache DataFusion 53 and is used
8//! by the [`rhei`](https://docs.rs/rhei) facade's `OlapBackend` enum.
9//!
10//! # Storage modes
11//!
12//! The engine is parameterised by a [`StorageMode`] chosen at construction time:
13//!
14//! | Variant | Durability | Notes |
15//! |---------|------------|-------|
16//! | [`StorageMode::InMemory`] | None — lost on shutdown | Fastest; ideal for tests and ephemeral workloads |
17//! | [`StorageMode::ArrowIpc`] | Durable — `.arrow` files | Zero serialisation overhead; no compression |
18//! | [`StorageMode::Parquet`] | Durable — `.parquet` files | Columnar compression + predicate pushdown; slower writes |
19//! | `StorageMode::S3Parquet` | Cloud (Amazon S3) | Available with the `cloud-storage` feature only |
20//! | `StorageMode::GcsParquet` | Cloud (Google Cloud Storage) | Available with the `cloud-storage` feature only |
21//!
22//! # Streaming queries
23//!
24//! [`rhei_core::OlapEngine::query_stream`] on [`DataFusionEngine`] returns a [`rhei_core::RecordBatchBoxStream`]
25//! backed by DataFusion's own `DataFrame::execute_stream()`. A thin
26//! `StreamAdapter` newtype adapts `SendableRecordBatchStream` to the common
27//! `RecordBatchBoxStream` type without buffering — results flow row-batch by
28//! row-batch directly to the caller. This is a key advantage over the DuckDB
29//! backend, which buffers the full result set before streaming.
30//!
31//! # No transactions
32//!
33//! DataFusion does not support SQL transactions.
34//! [`rhei_core::OlapEngine::supports_transactions`] returns `false` for this
35//! backend. The Rhei sync engine handles partial-failure recovery via CDC
36//! sequence numbers instead of relying on `BEGIN`/`COMMIT`.
37//!
38//! # DML strategy
39//!
40//! DataFusion's `MemTable` and `ListingTable` are read-only. DML
41//! (INSERT / UPDATE / DELETE) is implemented by mutating the engine's own
42//! table store and re-registering tables with the `SessionContext`. SQL
43//! statements are parsed with `sqlparser-rs` (SQLite dialect) so that the sync
44//! engine's generated SQL is handled correctly.
45//!
46//! # Cloud storage
47//!
48//! S3 and GCS backends are gated behind the `cloud-storage` workspace feature.
49//! When enabled, `object_store` (with `aws` and `gcp` sub-features) is pulled
50//! in. Credentials are resolved from the environment at engine construction
51//! time following `object_store` conventions (e.g. `AWS_ACCESS_KEY_ID` /
52//! `GOOGLE_APPLICATION_CREDENTIALS`).
53//!
54//! # Feature flags
55//!
56//! - `cloud-storage` — enables `StorageMode::S3Parquet` and
57//! `StorageMode::GcsParquet` via the `object_store` crate.
58
59pub mod engine;
60pub mod error;
61pub mod storage;
62
63pub use engine::{DataFusionEngine, SharedDataFusionEngine};
64pub use error::DfOlapError;
65pub use storage::StorageMode;