Skip to main content

rhei_datafusion/
lib.rs

1//! DataFusion OLAP backend for the Rhei HTAP engine.
2//!
3//! # Architecture position
4//!
5//! `rhei-datafusion` is the **default OLAP backend** in Rhei (enabled by the
6//! `datafusion-backend` workspace feature, which is on by default).  It
7//! implements [`rhei_core::OlapEngine`] for Apache DataFusion 53 and is used
8//! by the [`rhei`](https://docs.rs/rhei) facade's `OlapBackend` enum.
9//!
10//! # Storage modes
11//!
12//! The engine is parameterised by a [`StorageMode`] chosen at construction time:
13//!
14//! | Variant | Durability | Notes |
15//! |---------|------------|-------|
16//! | [`StorageMode::InMemory`] | None — lost on shutdown | Fastest; ideal for tests and ephemeral workloads |
17//! | [`StorageMode::Vortex`] | Durable — `.vortex` files (local or S3) | Auto-detects local vs S3 from URL scheme |
18//!
19//! # Streaming queries
20//!
21//! [`rhei_core::OlapEngine::query_stream`] on [`DataFusionEngine`] returns a
22//! [`rhei_core::RecordBatchBoxStream`] backed by DataFusion's own
23//! `DataFrame::execute_stream()`.  A thin `StreamAdapter` newtype adapts
24//! `SendableRecordBatchStream` to the common `RecordBatchBoxStream` type
25//! without buffering — results flow row-batch by row-batch directly to the
26//! caller.
27//!
28//! # No transactions
29//!
30//! DataFusion does not support SQL transactions.
31//! [`rhei_core::OlapEngine::supports_transactions`] returns `false` for this
32//! backend.  The Rhei sync engine handles partial-failure recovery via CDC
33//! sequence numbers instead of relying on `BEGIN`/`COMMIT`.
34//!
35//! # DML strategy
36//!
37//! `InMemory`: mutations update an in-memory `HashMap` of `Vec<RecordBatch>`
38//! and re-register a `MemTable` with DataFusion after each change.
39//!
40//! `Vortex`: INSERT routes through DataFusion's `VortexFormatFactory` sink (SQL
41//! `INSERT INTO … SELECT * FROM tmp`).  UPDATE/DELETE use a read-modify-write
42//! cycle: read all data via `SELECT *`, apply mutations in-memory, clear the
43//! table directory, and re-insert.
44//!
45//! # Cloud storage
46//!
47//! S3-compatible backends are gated behind the `cloud-storage` workspace feature.
48//! When enabled, `object_store` (with `aws` sub-feature) is pulled in.
49//! Credentials are resolved from the environment at engine construction time
50//! following `object_store` conventions (e.g. `AWS_ACCESS_KEY_ID`).
51//! S3-compatible services (MinIO, Cloudflare R2, Wasabi, Ceph RGW) work via
52//! `AWS_ENDPOINT_URL`.
53//!
54//! # Feature flags
55//!
56//! - `cloud-storage` — enables `StorageMode::Vortex` with `s3://` URLs via
57//!   the `object_store` crate.
58
59pub mod engine;
60pub mod error;
61pub mod storage;
62
63pub use engine::{DataFusionEngine, SharedDataFusionEngine};
64pub use error::DfOlapError;
65pub use storage::StorageMode;