Expand description
MVCC commit log records — the WAL-resident representation of
BEGIN CONCURRENT writes (Phase 11.9).
Per docs/concurrent-writes-plan.md:
WAL log record format: a new frame kind carrying
(table_id, rowid, op, payload)tuples. Distinct from the existing per-page commit frame; the checkpointer flattens log records into page-level updates.
§What this gives us in our hybrid architecture
Phase 11.4 ships BEGIN CONCURRENT commits that mirror writes
into both MvStore (in-memory) and Database::tables (legacy
save path). The legacy save handles durability — the tables
are page-encoded into the WAL and fsync’d. But MvStore lives
only in memory, so it starts empty on every reopen. That’s
correct for single-session workloads (each session re-derives
conflict-detection state from new commits) but means MVCC’s
conflict-detection window doesn’t survive a process restart.
Phase 11.9 closes that gap by also appending an MVCC log
record frame to the WAL on every successful concurrent commit.
On reopen, the WAL replay walks the MVCC frames in addition to
the page frames and re-populates MvStore with the committed
versions. Same fsync covers both — the MVCC frame is written
to the WAL buffer right before the legacy save fsync, so a
crash either loses both or commits both.
§Body layout (fits inside a 4 KiB frame body)
bytes 0..8 magic: "MVCC0001" (ASCII, no NUL)
bytes 8..16 commit_ts: u64 LE
bytes 16..18 record count: u16 LE (max ~256 records / tx for v0)
bytes 18.. record stream — each record:
byte 0 op tag: 0 = Tombstone, 1 = Present
bytes 1..3 table-name length: u16 LE
bytes .. table name: N bytes UTF-8
bytes .. rowid: i64 LE (8 bytes)
if op = Present:
bytes .. column count: u16 LE
for each column:
bytes .. name length: u16 LE
bytes .. name: N bytes UTF-8
byte .. value type tag: 0 Null, 1 Int, 2 Real, 3 Text,
4 Bool, 5 Vector
bytes .. value:
Int: i64 LE (8 bytes)
Real: f64 LE (8 bytes)
Text: u32 LE length + N bytes UTF-8
Bool: 1 byte (0 / 1)
Vector: u32 LE length + 4*N bytes f32 LE
(Tombstone has no payload after rowid.)
bytes N..PAGE_SIZE zero-paddedThe whole batch must fit in 4096 bytes (the frame body size). v0 surfaces a typed error if encoding overflows; multi-frame batches (for very large transactions) are a separate slice.
§Why one batch per commit
A transaction’s writes are committed atomically. Bundling them into one frame means a single WAL fsync covers the whole batch: we never end up with half a transaction durable. A torn frame (the checksum catches it) drops the whole transaction, which is the right rollback semantics.
Structs§
- Mvcc
Commit Batch - All the writes a single
BEGIN CONCURRENTtransaction produced at its commit. Encoded into one WAL frame body; replayed atomically (a torn batch drops the whole transaction). - Mvcc
LogRecord - One row’s worth of state at the moment of commit. Decoded from
the WAL on reopen, applied to
MvStore(and re-applied toDatabase::tablesfor the snapshot reader path) by the replayer.
Constants§
- MVCC_
BODY_ MAGIC - Magic bytes at the start of every encoded MVCC commit batch.
Reserves space for future format-version bumps without changing
the frame-level discriminator. The trailing
0001is the v1 payload format version; bump on incompatible body changes. - MVCC_
BODY_ PAYLOAD_ CAP - Maximum batch payload size — the frame body size, with the magic + commit_ts + record-count header stripped off. Encoders reject batches whose serialised form would exceed this.
- MVCC_
FRAME_ MARKER - Marker stored in the frame header’s
page_numfield that distinguishes MVCC log-record frames from page-commit frames.u32::MAXis safely outside the legal page-number range (max realistic database has at most a few hundred million pages, far short ofu32::MAX).