Skip to main content

Crate rhei_sync

Crate rhei_sync 

Source
Expand description

CDC sync engine, SQL query router, and CDC-to-DML converter for Rhei.

rhei-sync is the glue crate that ties the OLTP and OLAP halves of Rhei’s HTAP pipeline together. It has three core responsibilities:

§1. Query routing (router)

SqlParserRouter parses every incoming SQL statement with sqlparser-rs (SQLite dialect) and directs it to the right backend:

  • OLTP — writes (INSERT / UPDATE / DELETE), DDL, transactions, and simple point-lookup SELECTs.
  • OLAP — aggregates, GROUP BY / HAVING, JOINs, window functions, CTEs, set operations (UNION / INTERSECT / EXCEPT), and subqueries.
  • Heuristic fallback — when the parser cannot handle the SQL (e.g. SQLite PRAGMA), a keyword-scan fallback is applied. The default is OLTP (safety-first).

§2. CDC-to-DML conversion (converter, temporal_converter)

CDC events (INSERT / UPDATE / DELETE) captured from OLTP triggers are converted into OLAP DML:

  • Destructive mode — mirror semantics: UPDATE overwrites the existing row, DELETE removes it.
  • Temporal mode (SCD Type 2) — append-only with validity intervals. Every change produces an INSERT; UPDATEs and DELETEs additionally close the previous row version by setting _rhei_valid_to. Point-in-time queries use: WHERE _rhei_valid_from <= T AND (_rhei_valid_to IS NULL OR _rhei_valid_to > T)

§Arrow-native bulk INSERT path

Consecutive INSERT events for the same table are grouped into a SyncOp::BatchInsert (see sync_engine). The engine first attempts to build a typed Arrow RecordBatch via converter::cdc_events_to_batch (or temporal_converter::cdc_events_to_temporal_batch in temporal mode) and calls OlapEngine::load_arrow, bypassing SQL parsing entirely. If the schema contains a type that the Arrow builder does not handle (Date, Timestamp, Decimal, List, Struct, …) the engine falls back to a SQL INSERT … VALUES statement.

§3. Background sync loop (background)

spawn_sync_loop starts a Tokio task that calls rhei_core::SyncEngine::sync_once on a configurable interval. Shutdown is cooperative: the caller holds a tokio_util::sync::CancellationToken and cancels it to stop the loop cleanly.

A DDL read-lock is acquired before every cycle so that schema mutations (which take the write lock) never interleave with an in-progress sync.

§SyncMode semantics

ModeUPDATEDELETE
Destructiveoverwrites the rowremoves the row
Temporalcloses previous version + inserts new versioncloses previous version + inserts tombstone

§SQL injection prevention

All table and column identifiers are validated against [A-Za-z0-9_] at register_table time. The converter re-validates them at conversion time for defense-in-depth.

Re-exports§

pub use background::spawn_sync_loop;
pub use error::SyncError;
pub use router::HeuristicRouter;
pub use router::SqlParserRouter;
pub use sync_engine::CdcSyncEngine;
pub use temporal_converter::temporalize_schema;

Modules§

background
Background sync loop that continuously drives CDC-to-OLAP replication.
converter
Destructive-mode CDC-to-DML converter and Arrow batch builder.
error
Error types for the CDC sync pipeline.
router
AST-based SQL query router that classifies statements as OLTP or OLAP.
sync_engine
Core CDC-to-OLAP sync engine.
temporal_converter
Temporal (SCD Type 2) CDC-to-DML converter and Arrow batch builder.