pub struct Checkpointer { /* private fields */ }Expand description
The Checkpointer flushes dirty IN nodes to the log.
Checkpoint flushes must be done in ascending order from the bottom of the tree up. This ensures that recovery can reconstruct the tree from the checkpoint.
§Checkpoint Algorithm
- Generate checkpoint ID
- Create and log CheckpointStart
- Build dirty IN map (organized by Btree level)
- Flush dirty INs level by level (bottom-up)
- Bottom levels logged provisionally
- Top level logged non-provisionally
- Create and log CheckpointEnd
- Update statistics
This implementation flushes dirty BINs via flush_dirty_bins_internal(),
which writes full BIN or BINDelta log entries depending on the dirty-slot
fraction (TREE_BIN_DELTA = 25%). Upper INs (level ≥ 2) are flushed
by flush_upper_ins_internal() after the BIN pass, bottom-up, using
Provisional::Yes for intermediate levels and Provisional::No for
the root. File utilization summaries are persisted via
persist_file_summaries() at the end of each checkpoint.
Implementations§
Source§impl Checkpointer
impl Checkpointer
Sourcepub fn new(config: CheckpointConfig) -> Self
pub fn new(config: CheckpointConfig) -> Self
Sourcepub fn with_bytes_interval(self, bytes: u64) -> Self
pub fn with_bytes_interval(self, bytes: u64) -> Self
Set the bytes-written threshold that triggers an immediate checkpoint.
Sourcepub fn with_time_interval(self, millis: u64) -> Self
pub fn with_time_interval(self, millis: u64) -> Self
Set the time-based checkpoint interval (milliseconds).
REC-D: wired from CHECKPOINTER_WAKEUP_INTERVAL. JE getWakeupPeriod
computes bytes-OR-time with bytes taking precedence; isRunnable
consults the time interval only when the byte interval is 0.
Sourcepub fn with_log_manager(self, lm: Arc<LogManager>) -> Self
pub fn with_log_manager(self, lm: Arc<LogManager>) -> Self
Attach a LogManager so that do_checkpoint writes real WAL entries.
Call this before invoking do_checkpoint when a writable log is
available (i.e. from EnvironmentImpl).
Sourcepub fn with_tree(self, tree: Arc<RwLock<Tree>>, db_id: u64) -> Self
pub fn with_tree(self, tree: Arc<RwLock<Tree>>, db_id: u64) -> Self
Attach a Tree so that do_checkpoint flushes dirty BINs in step 4.
db_id is the database ID passed to Tree::collect_dirty_bins().
Checkpointer receiving the environment’s tree reference.
Sourcepub fn with_db_trees_registry(
self,
registry: Arc<Mutex<HashMap<i64, Arc<RwLock<Tree>>>>>,
) -> Self
pub fn with_db_trees_registry( self, registry: Arc<Mutex<HashMap<i64, Arc<RwLock<Tree>>>>>, ) -> Self
Wire the env-wide db-tree registry so the checkpointer flushes ALL user-database dirty BINs, not just the primary tree.
This is the Stage-1 fix: JE’s Checkpointer.processINList walks a
single env-wide INList covering all databases. Noxu achieves the
same effect by iterating db_trees_registry and flushing each tree.
Sourcepub fn with_utilization_tracker(
self,
tracker: Arc<Mutex<UtilizationTracker>>,
) -> Self
pub fn with_utilization_tracker( self, tracker: Arc<Mutex<UtilizationTracker>>, ) -> Self
Attach a UtilizationTracker so that persist_file_summaries() writes
real FileSummaryLN WAL entries during each checkpoint.
Checkpointer receiving the environment’s utilization tracker.
Sourcepub fn with_cleaner(self, cleaner: Arc<Cleaner>) -> Self
pub fn with_cleaner(self, cleaner: Arc<Cleaner>) -> Self
Wire a cleaner so that do_checkpoint calls
cleaner.after_checkpoint() after a successful checkpoint.
This is the X-5 fix: it activates the three-state checkpoint barrier
(cleaned → checkpointed → safe_to_delete) in FileSelector so that
log files are only deleted after their migrations have been captured by
two successive checkpoints.
Sourcepub fn with_txn_manager(self, txn_manager: Arc<TxnManager>) -> Self
pub fn with_txn_manager(self, txn_manager: Arc<TxnManager>) -> Self
Wire the transaction manager so do_checkpoint can compute the real
first_active_lsn for CkptEnd (T-F3/T-F4).
Safe to call only after Stage 1 (user-database BINs are checkpointed);
before Stage 1 a non-zero first_active_lsn would cause recovery to
skip committed LNs not captured in any BIN.
Sourcepub fn with_id_sources(self, next_db_id: Arc<AtomicI64>) -> Self
pub fn with_id_sources(self, next_db_id: Arc<AtomicI64>) -> Self
REC-S: wire the env’s db-id counter so do_checkpoint writes the real
last node/db/txn id values into CheckpointEnd instead of zeros.
next_db_id is the env’s AtomicI64 (last allocated db-id =
next_db_id - 1). The last txn-id is read from the wired
txn_manager; the last node-id from the tree-wide node counter.
JE Checkpointer.doCheckpoint reads envImpl.getNodeSequence() .getLastLocalNodeId(), getDbTree().getLastLocalDbId(), and
getTxnManager().getLastLocalTxnId() into the CheckpointEnd.
Sourcepub fn wakeup_after_write(&self, bytes: u64)
pub fn wakeup_after_write(&self, bytes: u64)
Accumulate bytes written and trigger a checkpoint when the threshold is exceeded.
Called after each WAL write from EnvironmentImpl (or LogManager) with
the number of bytes appended. When the running total exceeds
checkpoint_bytes_interval the counter is reset and
do_checkpoint("wakeup") is invoked synchronously.
Sourcepub fn is_runnable(&self, force: bool) -> bool
pub fn is_runnable(&self, force: bool) -> bool
Whether a periodic (daemon) checkpoint should run now (JE
Checkpointer.isRunnable). Without this gate the daemon wrote a
checkpoint on every wakeup tick even on a fully idle environment
(wasted I/O). Returns true if:
force, OR- REC-F: the cleaner has files pending reclaim
(
needCheckpointForCleanedFiles()→isCheckpointNeeded()), even with no writes — so an idle env still reclaims cleaned files, OR - bytes written since the last checkpoint >= the byte interval, OR
- (only when the byte interval is disabled) the time interval elapsed
AND something was written since the last checkpoint
(
bytes_since_checkpoint > 0— JE’slastUsedLsn != lastCheckpointEndidle-guard).
JE ref: Checkpointer.isRunnable — order is force, then
wakeupAfterNoWrites && needCheckpointForCleanedFiles(), then the
bytes-OR-time interval (bytes takes precedence; the time branch only
runs when logSizeBytesInterval == 0).
Sourcepub fn is_checkpointed(node: &NodeRwLock<TreeNode>) -> bool
pub fn is_checkpointed(node: &NodeRwLock<TreeNode>) -> bool
Returns true if the given BIN node has been checkpointed at least
once (its last_full_lsn is not NULL_LSN).
The evictor calls this before evicting a node: a node that has never been checkpointed would be lost on eviction because it has no on-disk representation yet.
Sourcepub fn persist_file_summaries(&self) -> Result<()>
pub fn persist_file_summaries(&self) -> Result<()>
Persist file utilization summaries to the WAL.
Writes a FileSummaryLN log entry for each tracked file summary so
that utilization data survives a restart.
Requires both a LogManager (via with_log_manager) and a
UtilizationTracker (via with_utilization_tracker) to be wired.
Returns Ok(()) without writing if either is absent.
Sourcepub fn do_checkpoint(&self, invoker: &str) -> Result<CheckpointResult>
pub fn do_checkpoint(&self, invoker: &str) -> Result<CheckpointResult>
Perform a checkpoint.
Sourcepub fn get_last_checkpoint_start(&self) -> Lsn
pub fn get_last_checkpoint_start(&self) -> Lsn
Get the LSN of the last checkpoint start.
Sourcepub fn get_last_checkpoint_end(&self) -> Lsn
pub fn get_last_checkpoint_end(&self) -> Lsn
Get the LSN of the last checkpoint end.
Sourcepub fn is_checkpoint_in_progress(&self) -> bool
pub fn is_checkpoint_in_progress(&self) -> bool
Check if a checkpoint is currently in progress.
Sourcepub fn get_eviction_provisional(
&self,
db_id: u64,
node_level: i32,
) -> Provisional
pub fn get_eviction_provisional( &self, db_id: u64, node_level: i32, ) -> Provisional
Choose the Provisional flag for a node being evicted by the evictor.
Returns Provisional::Yes when a checkpoint is in progress and the
node’s level is strictly below the tree-specific highest flush level
for db_id (meaning the checkpoint will write a non-provisional ancestor
for that tree that subsumes this entry). Returns Provisional::No if
no checkpoint is in progress, or if db_id has no dirty upper INs in
this checkpoint (level absent from map → 0 → not covered).
§JE reference
Checkpointer.coordinateEvictionWithCheckpoint →
DirtyINMap.coordinateEvictionWithCheckpoint which calls
getHighestFlushLevel(db) — per-DatabaseImpl lookup. If the db
is absent from highestFlushLevels, getHighestFlushLevel returns
IN.MIN_LEVEL (≤ 0) making the comparison false → Provisional::NO.
§CC-4 residual
The prior implementation stored a single global max-level (AtomicI32)
that was the maximum across ALL trees. A BIN evicted from tree A (no
dirty upper INs) got Provisional::Yes because tree B’s level was
non-zero, but NO non-provisional ancestor was written for tree A →
recovery discards the provisional BIN → data loss on crash before the
next checkpoint. Per-tree lookup (this method) fixes that: tree A’s
level is absent → 0 → Provisional::No (authoritative log entry).
§Race window
Same benign race as JE: if the checkpoint finishes between the
in_progress read and the log write, the BIN may be logged
Provisional::Yes without a covering ancestor in this checkpoint, but
the next checkpoint will cover it. Logging Yes without strict need is
safe (log bloat only); the reverse is what causes recovery inconsistency.
pub fn get_stats(&self) -> Arc<CheckpointStats>
Sourcepub fn get_config(&self) -> &CheckpointConfig
pub fn get_config(&self) -> &CheckpointConfig
Get the configuration.
Sourcepub fn request_shutdown(&self)
pub fn request_shutdown(&self)
Request shutdown of the checkpointer.
Sets the shutdown flag AND wakes up the daemon thread so it exits immediately without waiting the full sleep interval.
Sourcepub fn is_shutdown(&self) -> bool
pub fn is_shutdown(&self) -> bool
Check if shutdown has been requested.
Sourcepub fn wakeup_after_no_writes(&self)
pub fn wakeup_after_no_writes(&self)
CLN-14: wake the checkpointer daemon promptly after a cleaning pass so cleaned files are deleted without waiting the full wakeup interval (default 60 s).
The cleaner registers this via Cleaner::set_checkpoint_wakeup_fn; it
is invoked at the end of a successful do_clean. It notifies the
daemon’s sleep condvar WITHOUT setting the shutdown flag, so the daemon
thread returns early from wait_for_shutdown_or_timeout and re-checks
is_runnable(false) — which returns true because
needs_checkpoint_for_cleaned_files() now reports the just-cleaned
files pending reclaim. The result is a prompt checkpoint that runs
after_checkpoint() and lets delete_safe_files remove the files.
JE: FileProcessor.doClean calls
envImpl.getCheckpointer().wakeupAfterNoWrites() (Cleaner/FileProcessor),
which sets wakeupAfterNoWrites = true and wakes the checkpointer;
Checkpointer.isRunnable then returns true via
needCheckpointForCleanedFiles(). Noxu folds the flag into the
cleaner query (is_runnable), so this method only has to wake the
sleeping daemon.
Sourcepub fn wait_for_shutdown_or_timeout(&self, duration: Duration)
pub fn wait_for_shutdown_or_timeout(&self, duration: Duration)
Sleep for duration or until request_shutdown() is called.
Used by the daemon thread in EnvironmentImpl instead of
thread::sleep() so that shutdown is immediate.
Sourcepub fn peek_next_checkpoint_id(&self) -> u64
pub fn peek_next_checkpoint_id(&self) -> u64
Get the next checkpoint ID (without incrementing).
Sourcepub fn init_intervals(
&self,
last_checkpoint_start: Lsn,
last_checkpoint_end: Lsn,
)
pub fn init_intervals( &self, last_checkpoint_start: Lsn, last_checkpoint_end: Lsn, )
REC-G: seed the checkpoint-interval baselines from a recovered
checkpoint, so the FIRST post-recovery checkpoint interval is measured
from the recovered CkptEnd rather than from process start.
Without this, last_checkpoint_start/_end start at NULL_LSN and
bytes_since_checkpoint at 0 after recovery, so the bytes/time gate
would treat all log written before the crash as “since the last
checkpoint” — firing a redundant checkpoint immediately, or (for the
time branch) measuring the interval from the wrong baseline.
JE ref: Checkpointer.initIntervals(lastCheckpointStart, lastCheckpointEnd, lastCheckpointMillis) — called from
RecoveryManager.recover() after the recovery scan completes. Noxu
passes the recovered checkpoint_start_lsn / checkpoint_end_lsn
(NULL_LSN when the log had no prior checkpoint, matching JE).
Sourcepub fn set_checkpoint_id(&self, last_checkpoint_id: u64)
pub fn set_checkpoint_id(&self, last_checkpoint_id: u64)
REC-H: continue the checkpoint-ID sequence after recovery instead of
restarting at 1. The next checkpoint will use last_checkpoint_id + 1.
The ID is a debug/log tag (not a correctness key), but it should not
regress or collide across restarts. Seeded from the recovered
CkptEnd.id.
JE ref: Checkpointer.setCheckpointId(lastCheckpointId) — “can only be
done after recovery”; JE stores checkpointId = lastCheckpointId and
incrementProgress/generateCheckpointId advances from there. Noxu’s
do_checkpoint does fetch_add(1), so we seed next_checkpoint_id = last_checkpoint_id + 1 to make the next emitted ID strictly greater.
Sourcepub fn flush_dirty_bins(&self) -> Result<()>
pub fn flush_dirty_bins(&self) -> Result<()>
Flush all dirty BINs to the log (public, unit-result API).
Calls the internal flush logic and discards the detailed FlushResult,
returning only success/failure. Use this from external callers (e.g.
daemon threads) that do not need per-BIN counts.
Checkpointer.doCheckpoint() partial flush path.