Struct Checkpointer

Source

pub struct Checkpointer { /* private fields */ }

Expand description

The Checkpointer flushes dirty IN nodes to the log.

Checkpoint flushes must be done in ascending order from the bottom of the tree up. This ensures that recovery can reconstruct the tree from the checkpoint.

§Checkpoint Algorithm

Generate checkpoint ID
Create and log CheckpointStart
Build dirty IN map (organized by Btree level)
Flush dirty INs level by level (bottom-up)
- Bottom levels logged provisionally
- Top level logged non-provisionally
Create and log CheckpointEnd
Update statistics

This implementation flushes dirty BINs via flush_dirty_bins_internal(), which writes full BIN or BINDelta log entries depending on the dirty-slot fraction (TREE_BIN_DELTA = 25%). Upper INs (level ≥ 2) are flushed by flush_upper_ins_internal() after the BIN pass, bottom-up, using Provisional::Yes for intermediate levels and Provisional::No for the root. File utilization summaries are persisted via persist_file_summaries() at the end of each checkpoint.

Implementations§

Source §

impl Checkpointer

Source

pub fn new(config: CheckpointConfig) -> Self

Create a new Checkpointer.

§Arguments

config - Checkpoint configuration

Source

pub fn with_bytes_interval(self, bytes: u64) -> Self

Set the bytes-written threshold that triggers an immediate checkpoint.

Source

pub fn with_time_interval(self, millis: u64) -> Self

Set the time-based checkpoint interval (milliseconds).

REC-D: wired from CHECKPOINTER_WAKEUP_INTERVAL. JE getWakeupPeriod computes bytes-OR-time with bytes taking precedence; isRunnable consults the time interval only when the byte interval is 0.

Source

pub fn with_log_manager(self, lm: Arc<LogManager>) -> Self

Attach a LogManager so that do_checkpoint writes real WAL entries.

Call this before invoking do_checkpoint when a writable log is available (i.e. from EnvironmentImpl).

Source

pub fn with_tree(self, tree: Arc<RwLock<Tree>>, db_id: u64) -> Self

Attach a Tree so that do_checkpoint flushes dirty BINs in step 4.

db_id is the database ID passed to Tree::collect_dirty_bins(). Checkpointer receiving the environment’s tree reference.

Source

pub fn with_db_trees_registry( self, registry: Arc<Mutex<HashMap<i64, Arc<RwLock<Tree>>>>>, ) -> Self

Wire the env-wide db-tree registry so the checkpointer flushes ALL user-database dirty BINs, not just the primary tree.

This is the Stage-1 fix: JE’s Checkpointer.processINList walks a single env-wide INList covering all databases. Noxu achieves the same effect by iterating db_trees_registry and flushing each tree.

Source

pub fn with_utilization_tracker( self, tracker: Arc<Mutex<UtilizationTracker>>, ) -> Self

Attach a UtilizationTracker so that persist_file_summaries() writes real FileSummaryLN WAL entries during each checkpoint.

Checkpointer receiving the environment’s utilization tracker.

Source

pub fn with_cleaner(self, cleaner: Arc<Cleaner>) -> Self

Wire a cleaner so that do_checkpoint calls cleaner.after_checkpoint() after a successful checkpoint.

This is the X-5 fix: it activates the three-state checkpoint barrier (cleaned → checkpointed → safe_to_delete) in FileSelector so that log files are only deleted after their migrations have been captured by two successive checkpoints.

Source

pub fn with_txn_manager(self, txn_manager: Arc<TxnManager>) -> Self

Wire the transaction manager so do_checkpoint can compute the real first_active_lsn for CkptEnd (T-F3/T-F4).

Safe to call only after Stage 1 (user-database BINs are checkpointed); before Stage 1 a non-zero first_active_lsn would cause recovery to skip committed LNs not captured in any BIN.

Source

pub fn with_id_sources(self, next_db_id: Arc<AtomicI64>) -> Self

REC-S: wire the env’s db-id counter so do_checkpoint writes the real last node/db/txn id values into CheckpointEnd instead of zeros.

next_db_id is the env’s AtomicI64 (last allocated db-id = next_db_id - 1). The last txn-id is read from the wired txn_manager; the last node-id from the tree-wide node counter.

JE Checkpointer.doCheckpoint reads envImpl.getNodeSequence() .getLastLocalNodeId(), getDbTree().getLastLocalDbId(), and getTxnManager().getLastLocalTxnId() into the CheckpointEnd.

Source

pub fn wakeup_after_write(&self, bytes: u64)

Accumulate bytes written and trigger a checkpoint when the threshold is exceeded.

Called after each WAL write from EnvironmentImpl (or LogManager) with the number of bytes appended. When the running total exceeds checkpoint_bytes_interval the counter is reset and do_checkpoint("wakeup") is invoked synchronously.

Source

pub fn is_runnable(&self, force: bool) -> bool

Whether a periodic (daemon) checkpoint should run now (JE Checkpointer.isRunnable). Without this gate the daemon wrote a checkpoint on every wakeup tick even on a fully idle environment (wasted I/O). Returns true if:

force, OR
REC-F: the cleaner has files pending reclaim (needCheckpointForCleanedFiles() → isCheckpointNeeded()), even with no writes — so an idle env still reclaims cleaned files, OR
bytes written since the last checkpoint >= the byte interval, OR
(only when the byte interval is disabled) the time interval elapsed AND something was written since the last checkpoint (bytes_since_checkpoint > 0 — JE’s lastUsedLsn != lastCheckpointEnd idle-guard).

JE ref: Checkpointer.isRunnable — order is force, then wakeupAfterNoWrites && needCheckpointForCleanedFiles(), then the bytes-OR-time interval (bytes takes precedence; the time branch only runs when logSizeBytesInterval == 0).

Source

pub fn is_checkpointed(node: &NodeRwLock<TreeNode>) -> bool

Returns true if the given BIN node has been checkpointed at least once (its last_full_lsn is not NULL_LSN).

The evictor calls this before evicting a node: a node that has never been checkpointed would be lost on eviction because it has no on-disk representation yet.

Source

pub fn persist_file_summaries(&self) -> Result<()>

Persist file utilization summaries to the WAL.

Writes a FileSummaryLN log entry for each tracked file summary so that utilization data survives a restart.

Requires both a LogManager (via with_log_manager) and a UtilizationTracker (via with_utilization_tracker) to be wired. Returns Ok(()) without writing if either is absent.

Source

pub fn do_checkpoint(&self, invoker: &str) -> Result<CheckpointResult>

Perform a checkpoint.

Source

pub fn get_last_checkpoint_start(&self) -> Lsn

Get the LSN of the last checkpoint start.

Source

pub fn get_last_checkpoint_end(&self) -> Lsn

Get the LSN of the last checkpoint end.

Source

pub fn is_checkpoint_in_progress(&self) -> bool

Check if a checkpoint is currently in progress.

Source

pub fn get_eviction_provisional( &self, db_id: u64, node_level: i32, ) -> Provisional

Choose the Provisional flag for a node being evicted by the evictor.

Returns Provisional::Yes when a checkpoint is in progress and the node’s level is strictly below the tree-specific highest flush level for db_id (meaning the checkpoint will write a non-provisional ancestor for that tree that subsumes this entry). Returns Provisional::No if no checkpoint is in progress, or if db_id has no dirty upper INs in this checkpoint (level absent from map → 0 → not covered).

§JE reference

Checkpointer.coordinateEvictionWithCheckpoint → DirtyINMap.coordinateEvictionWithCheckpoint which calls getHighestFlushLevel(db) — per-DatabaseImpl lookup. If the db is absent from highestFlushLevels, getHighestFlushLevel returns IN.MIN_LEVEL (≤ 0) making the comparison false → Provisional::NO.

§CC-4 residual

The prior implementation stored a single global max-level (AtomicI32) that was the maximum across ALL trees. A BIN evicted from tree A (no dirty upper INs) got Provisional::Yes because tree B’s level was non-zero, but NO non-provisional ancestor was written for tree A → recovery discards the provisional BIN → data loss on crash before the next checkpoint. Per-tree lookup (this method) fixes that: tree A’s level is absent → 0 → Provisional::No (authoritative log entry).

§Race window

Same benign race as JE: if the checkpoint finishes between the in_progress read and the log write, the BIN may be logged Provisional::Yes without a covering ancestor in this checkpoint, but the next checkpoint will cover it. Logging Yes without strict need is safe (log bloat only); the reverse is what causes recovery inconsistency.

Source

pub fn get_stats(&self) -> Arc<CheckpointStats>

Source

pub fn get_config(&self) -> &CheckpointConfig

Get the configuration.

Source

pub fn request_shutdown(&self)

Request shutdown of the checkpointer.

Sets the shutdown flag AND wakes up the daemon thread so it exits immediately without waiting the full sleep interval.

Source

pub fn is_shutdown(&self) -> bool

Check if shutdown has been requested.

Source

pub fn wakeup_after_no_writes(&self)

CLN-14: wake the checkpointer daemon promptly after a cleaning pass so cleaned files are deleted without waiting the full wakeup interval (default 60 s).

The cleaner registers this via Cleaner::set_checkpoint_wakeup_fn; it is invoked at the end of a successful do_clean. It notifies the daemon’s sleep condvar WITHOUT setting the shutdown flag, so the daemon thread returns early from wait_for_shutdown_or_timeout and re-checks is_runnable(false) — which returns true because needs_checkpoint_for_cleaned_files() now reports the just-cleaned files pending reclaim. The result is a prompt checkpoint that runs after_checkpoint() and lets delete_safe_files remove the files.

JE: FileProcessor.doClean calls envImpl.getCheckpointer().wakeupAfterNoWrites() (Cleaner/FileProcessor), which sets wakeupAfterNoWrites = true and wakes the checkpointer; Checkpointer.isRunnable then returns true via needCheckpointForCleanedFiles(). Noxu folds the flag into the cleaner query (is_runnable), so this method only has to wake the sleeping daemon.

Source

pub fn wait_for_shutdown_or_timeout(&self, duration: Duration)

Sleep for duration or until request_shutdown() is called.

Used by the daemon thread in EnvironmentImpl instead of thread::sleep() so that shutdown is immediate.

Source

pub fn peek_next_checkpoint_id(&self) -> u64

Get the next checkpoint ID (without incrementing).

Source

pub fn init_intervals( &self, last_checkpoint_start: Lsn, last_checkpoint_end: Lsn, )

REC-G: seed the checkpoint-interval baselines from a recovered checkpoint, so the FIRST post-recovery checkpoint interval is measured from the recovered CkptEnd rather than from process start.

Without this, last_checkpoint_start/_end start at NULL_LSN and bytes_since_checkpoint at 0 after recovery, so the bytes/time gate would treat all log written before the crash as “since the last checkpoint” — firing a redundant checkpoint immediately, or (for the time branch) measuring the interval from the wrong baseline.

JE ref: Checkpointer.initIntervals(lastCheckpointStart, lastCheckpointEnd, lastCheckpointMillis) — called from RecoveryManager.recover() after the recovery scan completes. Noxu passes the recovered checkpoint_start_lsn / checkpoint_end_lsn (NULL_LSN when the log had no prior checkpoint, matching JE).

Source

pub fn set_checkpoint_id(&self, last_checkpoint_id: u64)

REC-H: continue the checkpoint-ID sequence after recovery instead of restarting at 1. The next checkpoint will use last_checkpoint_id + 1.

The ID is a debug/log tag (not a correctness key), but it should not regress or collide across restarts. Seeded from the recovered CkptEnd.id.

JE ref: Checkpointer.setCheckpointId(lastCheckpointId) — “can only be done after recovery”; JE stores checkpointId = lastCheckpointId and incrementProgress/generateCheckpointId advances from there. Noxu’s do_checkpoint does fetch_add(1), so we seed next_checkpoint_id = last_checkpoint_id + 1 to make the next emitted ID strictly greater.

Source

pub fn flush_dirty_bins(&self) -> Result<()>

Flush all dirty BINs to the log (public, unit-result API).

Calls the internal flush logic and discards the detailed FlushResult, returning only success/failure. Use this from external callers (e.g. daemon threads) that do not need per-BIN counts.

Checkpointer.doCheckpoint() partial flush path.