Expand description
CDC sync engine, SQL query router, and CDC-to-DML converter for Rhei.
rhei-sync is the glue crate that ties the OLTP and OLAP halves of
Rhei’s HTAP pipeline together. It has three core responsibilities:
§1. Query routing (router)
SqlParserRouter parses every incoming SQL statement with
sqlparser-rs (SQLite dialect) and directs it to the right backend:
- OLTP — writes (INSERT / UPDATE / DELETE), DDL, transactions, and simple point-lookup SELECTs.
- OLAP — aggregates, GROUP BY / HAVING, JOINs, window functions, CTEs, set operations (UNION / INTERSECT / EXCEPT), and subqueries.
- Heuristic fallback — when the parser cannot handle the SQL (e.g.
SQLite
PRAGMA), a keyword-scan fallback is applied. The default is OLTP (safety-first).
§2. CDC-to-DML conversion (converter, temporal_converter)
CDC events (INSERT / UPDATE / DELETE) captured from OLTP triggers are
converted into OLAP DML:
- Destructive mode — mirror semantics: UPDATE overwrites the existing row, DELETE removes it.
- Temporal mode (SCD Type 2) — append-only with validity intervals.
Every change produces an INSERT; UPDATEs and DELETEs additionally close
the previous row version by setting
_rhei_valid_to. Point-in-time queries use:WHERE _rhei_valid_from <= T AND (_rhei_valid_to IS NULL OR _rhei_valid_to > T)
§Arrow-native bulk INSERT path
Consecutive INSERT events for the same table are grouped into a
SyncOp::BatchInsert (see sync_engine). The engine first attempts to
build a typed Arrow RecordBatch via converter::cdc_events_to_batch (or
temporal_converter::cdc_events_to_temporal_batch in temporal mode) and
calls OlapEngine::load_arrow, bypassing SQL parsing entirely. If the
schema contains a type that the Arrow builder does not handle (Date,
Timestamp, Decimal, List, Struct, …) the engine falls back to a SQL
INSERT … VALUES statement.
§3. Background sync loop (background)
spawn_sync_loop starts a Tokio task that calls
rhei_core::SyncEngine::sync_once on a configurable interval. Shutdown
is cooperative: the caller holds a
tokio_util::sync::CancellationToken and cancels it to stop the loop
cleanly.
A DDL read-lock is acquired before every cycle so that schema mutations (which take the write lock) never interleave with an in-progress sync.
§SyncMode semantics
| Mode | UPDATE | DELETE |
|---|---|---|
Destructive | overwrites the row | removes the row |
Temporal | closes previous version + inserts new version | closes previous version + inserts tombstone |
§SQL injection prevention
All table and column identifiers are validated against [A-Za-z0-9_] at
register_table time. The converter re-validates them at conversion time
for defense-in-depth.
Re-exports§
pub use background::spawn_sync_loop;pub use error::SyncError;pub use router::HeuristicRouter;pub use router::SqlParserRouter;pub use sync_engine::CdcSyncEngine;pub use temporal_converter::temporalize_schema;
Modules§
- background
- Background sync loop that continuously drives CDC-to-OLAP replication.
- converter
- Destructive-mode CDC-to-DML converter and Arrow batch builder.
- error
- Error types for the CDC sync pipeline.
- router
- AST-based SQL query router that classifies statements as OLTP or OLAP.
- sync_
engine - Core CDC-to-OLAP sync engine.
- temporal_
converter - Temporal (SCD Type 2) CDC-to-DML converter and Arrow batch builder.