Skip to main content

Module log

Module log 

Source
Expand description

MVCC commit log records — the WAL-resident representation of BEGIN CONCURRENT writes (Phase 11.9).

Per docs/concurrent-writes-plan.md:

WAL log record format: a new frame kind carrying (table_id, rowid, op, payload) tuples. Distinct from the existing per-page commit frame; the checkpointer flattens log records into page-level updates.

§What this gives us in our hybrid architecture

Phase 11.4 ships BEGIN CONCURRENT commits that mirror writes into both MvStore (in-memory) and Database::tables (legacy save path). The legacy save handles durability — the tables are page-encoded into the WAL and fsync’d. But MvStore lives only in memory, so it starts empty on every reopen. That’s correct for single-session workloads (each session re-derives conflict-detection state from new commits) but means MVCC’s conflict-detection window doesn’t survive a process restart.

Phase 11.9 closes that gap by also appending an MVCC log record frame to the WAL on every successful concurrent commit. On reopen, the WAL replay walks the MVCC frames in addition to the page frames and re-populates MvStore with the committed versions. Same fsync covers both — the MVCC frame is written to the WAL buffer right before the legacy save fsync, so a crash either loses both or commits both.

§Body layout (fits inside a 4 KiB frame body)

  bytes  0..8    magic: "MVCC0001" (ASCII, no NUL)
  bytes  8..16   commit_ts: u64 LE
  bytes 16..18   record count: u16 LE  (max ~256 records / tx for v0)
  bytes 18..    record stream — each record:
    byte  0           op tag: 0 = Tombstone, 1 = Present
    bytes 1..3        table-name length: u16 LE
    bytes ..          table name: N bytes UTF-8
    bytes ..          rowid: i64 LE (8 bytes)
    if op = Present:
      bytes ..          column count: u16 LE
      for each column:
        bytes ..          name length: u16 LE
        bytes ..          name: N bytes UTF-8
        byte  ..          value type tag: 0 Null, 1 Int, 2 Real, 3 Text,
                                           4 Bool, 5 Vector
        bytes ..          value:
          Int:  i64 LE (8 bytes)
          Real: f64 LE (8 bytes)
          Text: u32 LE length + N bytes UTF-8
          Bool: 1 byte (0 / 1)
          Vector: u32 LE length + 4*N bytes f32 LE
    (Tombstone has no payload after rowid.)
  bytes N..PAGE_SIZE  zero-padded

The whole batch must fit in 4096 bytes (the frame body size). v0 surfaces a typed error if encoding overflows; multi-frame batches (for very large transactions) are a separate slice.

§Why one batch per commit

A transaction’s writes are committed atomically. Bundling them into one frame means a single WAL fsync covers the whole batch: we never end up with half a transaction durable. A torn frame (the checksum catches it) drops the whole transaction, which is the right rollback semantics.

Structs§

MvccCommitBatch
All the writes a single BEGIN CONCURRENT transaction produced at its commit. Encoded into one WAL frame body; replayed atomically (a torn batch drops the whole transaction).
MvccLogRecord
One row’s worth of state at the moment of commit. Decoded from the WAL on reopen, applied to MvStore (and re-applied to Database::tables for the snapshot reader path) by the replayer.

Constants§

MVCC_BODY_MAGIC
Magic bytes at the start of every encoded MVCC commit batch. Reserves space for future format-version bumps without changing the frame-level discriminator. The trailing 0001 is the v1 payload format version; bump on incompatible body changes.
MVCC_BODY_PAYLOAD_CAP
Maximum batch payload size — the frame body size, with the magic + commit_ts + record-count header stripped off. Encoders reject batches whose serialised form would exceed this.
MVCC_FRAME_MARKER
Marker stored in the frame header’s page_num field that distinguishes MVCC log-record frames from page-commit frames. u32::MAX is safely outside the legal page-number range (max realistic database has at most a few hundred million pages, far short of u32::MAX).