Expand description
Write-Ahead Log (WAL) file format.
Phase 4b introduces the .sqlrite-wal sidecar file. Writes don’t go to
the main .sqlrite file anymore once the WAL is wired in (Phase 4c);
instead they append frames to this log, and a periodic checkpoint
(Phase 4d) applies frames back into the main file.
This module is the format layer — header, frame, codec, reader,
writer. It doesn’t know anything about the Pager yet; that wiring is
the next slice.
On-disk layout
byte 0..32 WAL header
0..8 magic "SQLRWAL\0"
8..12 format version (u32 LE)
v1: pre-Phase-11
v2: Phase 11.2 — adds clock_high_water
in bytes 24..32
12..16 page size (u32 LE) = 4096
16..20 salt (u32 LE) — random on create,
re-rolled per checkpoint
20..24 checkpoint seq (u32 LE) — bumps per checkpoint
24..32 v2: clock_high_water (u64 LE) — last
persisted MVCC logical clock value;
`crate::mvcc::MvccClock::observe`'d on
reopen so timestamps don't reuse values
across restarts.
v1: reserved / zero (read as 0).
byte 32.. sequence of frames, each `FRAME_SIZE` bytes:
0..4 page number (u32 LE)
4..8 commit-page-count (u32 LE)
0 = dirty frame (part of an open write)
>0 = commit frame; value = page count at commit
8..12 salt (u32 LE) — copied from WAL header,
detects truncation / file swap
12..16 checksum (u32 LE) — rolling sum over the
frame header bytes
[0..12] + the payload
16..16+PAGE_SIZE page bytesFormat version compatibility. v1 WALs (written by pre-Phase-11
builds) open cleanly: their reserved bytes are zero, which we
interpret as clock_high_water = 0 — exactly what a fresh-from-
checkpoint clock would carry. The next time the WAL is rewritten
(any checkpoint, including the auto-checkpoint that fires past the
frame threshold) it lands on disk as v2. There’s no “you must
upgrade your files” step.
Checksum. A rolling rotate_left(1) + byte sum over the
concatenation of the frame’s first 12 header bytes (page_num,
commit-page-count, salt) and its PAGE_SIZE body. Catches bit flips
and most multi-byte corruption without pulling in a dep. The 13th
through 16th header bytes (the checksum field itself) are excluded
from the computation, obviously.
Torn-write recovery. On open, the reader walks frames from the
start and verifies each checksum. The first invalid or incomplete
frame marks where the WAL effectively ends; anything past it is
treated as if it doesn’t exist. Callers learn what’s committed vs
what’s speculative from Wal::last_commit_offset / the is_commit
flag of each scanned frame.
Structs§
- Frame
Header - Parsed per-frame header (everything but the page body).
- Wal
- WalHeader
- Parsed WAL header.
page_sizeis redundant with the engine’s compile- time constant; we persist it for forward-compat and reject anything that doesn’t match at open time.
Constants§
- FRAME_
HEADER_ SIZE - FRAME_
SIZE - WAL_
FORMAT_ VERSION - The version the engine writes today. Phase 11.2 bumped 1 → 2 to
Bumped 2 → 3 in Phase 11.9 to mark “may contain MVCC log-record
frames” — frames whose
page_numfield carriescrate::mvcc::MVCC_FRAME_MARKER(u32::MAX) instead of a real page number. v1 and v2 readers had no special-case for that marker and a pre-Phase-11.9 checkpoint would try to flush it to the main file at offsetu32::MAX * PAGE_SIZE(way past EOF), which is why the bump is needed. - WAL_
FORMAT_ VERSION_ MIN_ SUPPORTED - Lowest format version we know how to open. v1 had the bytes that
now hold
clock_high_waterreserved-as-zero, which is identical to “clock has never been persisted” and round-trips cleanly. v2 adds the clock_high_water field but no MVCC frames. v3 adds MVCC frames; downgrading a v3 WAL into a v2 reader would mis-handle the MVCC marker page number — but a freshtruncate(every checkpoint) rewrites the header at the engine’s current version, so the cross-version exposure is bounded. - WAL_
HEADER_ SIZE - WAL_
MAGIC