rhei-sync 1.5.0

CDC sync engine and query router for Rhei
Documentation

CDC sync engine, SQL query router, and CDC-to-DML converter for Rhei.

rhei-sync is the glue crate that ties the OLTP and OLAP halves of Rhei's HTAP pipeline together. It has three core responsibilities:

1. Query routing ([router])

[SqlParserRouter] parses every incoming SQL statement with sqlparser-rs (SQLite dialect) and directs it to the right backend:

  • OLTP — writes (INSERT / UPDATE / DELETE), DDL, transactions, and simple point-lookup SELECTs.
  • OLAP — aggregates, GROUP BY / HAVING, JOINs, window functions, CTEs, set operations (UNION / INTERSECT / EXCEPT), and subqueries.
  • Heuristic fallback — when the parser cannot handle the SQL (e.g. SQLite PRAGMA), a keyword-scan fallback is applied. The default is OLTP (safety-first).

2. CDC-to-DML conversion ([converter], [temporal_converter])

CDC events (INSERT / UPDATE / DELETE) captured from OLTP triggers are converted into OLAP DML:

  • Destructive mode — mirror semantics: UPDATE overwrites the existing row, DELETE removes it.
  • Temporal mode (SCD Type 2) — append-only with validity intervals. Every change produces an INSERT; UPDATEs and DELETEs additionally close the previous row version by setting _rhei_valid_to. Point-in-time queries use: WHERE _rhei_valid_from <= T AND (_rhei_valid_to IS NULL OR _rhei_valid_to > T)

Arrow-native bulk INSERT path

Consecutive INSERT events for the same table are grouped into a SyncOp::BatchInsert (see [sync_engine]). The engine first attempts to build a typed Arrow RecordBatch via [converter::cdc_events_to_batch] (or [temporal_converter::cdc_events_to_temporal_batch] in temporal mode) and calls OlapEngine::load_arrow, bypassing SQL parsing entirely. If the schema contains a type that the Arrow builder does not handle (Date, Timestamp, Decimal, List, Struct, …) the engine falls back to a SQL INSERT … VALUES statement.

3. Background sync loop ([background])

[spawn_sync_loop] starts a Tokio task that calls [rhei_core::SyncEngine::sync_once] on a configurable interval. Shutdown is cooperative: the caller holds a [tokio_util::sync::CancellationToken] and cancels it to stop the loop cleanly.

A DDL read-lock is acquired before every cycle so that schema mutations (which take the write lock) never interleave with an in-progress sync.

SyncMode semantics

Mode UPDATE DELETE
Destructive overwrites the row removes the row
Temporal closes previous version + inserts new version closes previous version + inserts tombstone

SQL injection prevention

All table and column identifiers are validated against [A-Za-z0-9_] at register_table time. The converter re-validates them at conversion time for defense-in-depth.