Expand description
Double-write buffer for torn write protection.
NVMe drives guarantee atomic 4 KiB sector writes but NOT atomic writes for larger pages (e.g., 16 KiB). If power fails mid-write on a 16 KiB page, the WAL page can be partially written (torn).
CRC32C detects torn writes during replay, but without the double-write buffer, the record is lost — even though it was acknowledged to the client.
The double-write buffer solves this:
- Before writing to WAL, write the record to the double-write file.
fsyncthe double-write file.- Write to the WAL file.
fsyncthe WAL file.
On recovery, if a WAL record’s CRC fails:
- Check the double-write buffer for an intact copy (verify CRC).
- If found, use the double-write copy to reconstruct the WAL page.
- If not found, the record is truly lost (pre-fsync crash).
The double-write file is a fixed-size circular buffer. Only the most recent N records are kept — older ones are overwritten. This is fine because torn writes can only happen on the most recent write.
Structs§
- Double
Write Buffer - Double-write buffer file.