CDC sync engine, SQL query router, and CDC-to-DML converter for Rhei.
rhei-sync is the glue crate that ties the OLTP and OLAP halves of
Rhei's HTAP pipeline together. It has three core responsibilities:
1. Query routing ([router])
[SqlParserRouter] parses every incoming SQL statement with
sqlparser-rs (SQLite dialect) and directs it to the right backend:
- OLTP — writes (INSERT / UPDATE / DELETE), DDL, transactions, and simple point-lookup SELECTs.
- OLAP — aggregates, GROUP BY / HAVING, JOINs, window functions, CTEs, set operations (UNION / INTERSECT / EXCEPT), and subqueries.
- Heuristic fallback — when the parser cannot handle the SQL (e.g.
SQLite
PRAGMA), a keyword-scan fallback is applied. The default is OLTP (safety-first).
2. CDC-to-DML conversion ([converter], [temporal_converter])
CDC events (INSERT / UPDATE / DELETE) captured from OLTP triggers are
converted into OLAP DML:
- Destructive mode — mirror semantics: UPDATE overwrites the existing row, DELETE removes it.
- Temporal mode (SCD Type 2) — append-only with validity intervals.
Every change produces an INSERT; UPDATEs and DELETEs additionally close
the previous row version by setting
_rhei_valid_to. Point-in-time queries use:WHERE _rhei_valid_from <= T AND (_rhei_valid_to IS NULL OR _rhei_valid_to > T)
Arrow-native bulk INSERT path
Consecutive INSERT events for the same table are grouped into a
SyncOp::BatchInsert (see [sync_engine]). The engine first attempts to
build a typed Arrow RecordBatch via [converter::cdc_events_to_batch] (or
[temporal_converter::cdc_events_to_temporal_batch] in temporal mode) and
calls OlapEngine::load_arrow, bypassing SQL parsing entirely. If the
schema contains a type that the Arrow builder does not handle (Date,
Timestamp, Decimal, List, Struct, …) the engine falls back to a SQL
INSERT … VALUES statement.
3. Background sync loop ([background])
[spawn_sync_loop] starts a Tokio task that calls
[rhei_core::SyncEngine::sync_once] on a configurable interval. Shutdown
is cooperative: the caller holds a
[tokio_util::sync::CancellationToken] and cancels it to stop the loop
cleanly.
A DDL read-lock is acquired before every cycle so that schema mutations (which take the write lock) never interleave with an in-progress sync.
SyncMode semantics
| Mode | UPDATE | DELETE |
|---|---|---|
Destructive |
overwrites the row | removes the row |
Temporal |
closes previous version + inserts new version | closes previous version + inserts tombstone |
SQL injection prevention
All table and column identifiers are validated against [A-Za-z0-9_] at
register_table time. The converter re-validates them at conversion time
for defense-in-depth.