pub struct Tree {
pub key_comparator: Option<KeyComparatorFn>,
pub memory_counter: Option<Arc<AtomicI64>>,
pub in_list_listener: Option<Arc<dyn InListListener>>,
pub log_manager: Option<Arc<LogManager>>,
pub key_prefixing: bool,
pub compact_max_key_length: i32,
/* private fields */
}Expand description
The B+tree.
This is the main tree structure that manages the B+tree nodes and provides operations for search, insert, delete, and tree maintenance.
Fields§
§key_comparator: Option<KeyComparatorFn>Optional custom key comparator for sorted-duplicate databases.
When Some, all key comparisons in tree traversal (upper IN routing
and BIN entry search/insert/delete) use this comparator instead of
lexicographic byte comparison.
/ dupComparator stored on the
database and consulted at every IN.findEntry() call.
memory_counter: Option<Arc<AtomicI64>>Shared memory counter for the evictor / MemoryBudget.
Updated on every BIN entry insert (+key+data+overhead) and delete (-key+overhead) so the evictor sees real cache pressure.
env.getMemoryBudget().updateTreeMemoryUsage(delta) call
in the equivalent IN.updateMemorySize(). In Noxu the counter is an
Arc<AtomicI64> shared with the Arbiter (and later MemoryBudget)
to avoid a circular crate dependency (noxu-tree → noxu-dbi).
in_list_listener: Option<Arc<dyn InListListener>>Optional listener fed on node add/access/remove, mirroring JE’s
INList feeding the evictor’s LRULists.
When None (the default — used by unit tests with no environment),
the notifications are no-ops. EnvironmentImpl installs the
Evictor here so production inserts/accesses populate the LRU lists
the evictor drains.
JE reference: IN.fetchTarget/split/rebuildINList → addBack,
access → moveBack, removal → remove.
log_manager: Option<Arc<LogManager>>Optional log manager so an evicted root IN can be re-materialized from
its persisted root_log_lsn on the next access (EV-14, piece B).
JE’s Tree reaches the log via database.getEnv().getLogManager();
Tree.getRootINRootAlreadyLatched calls root.fetchTarget(...) which
reads the root IN back from its ChildReference LSN when the in-memory
target is null (Tree.java:477-516, ChildReference.fetchTarget). Noxu
has no env back-reference here, so the log manager is installed
directly (the same one-way wiring as in_list_listener). When None
(unit tests with no environment), an evicted root cannot be re-fetched
— but evict_root refuses to evict without a log manager, so the root
is never made non-resident in that configuration.
key_prefixing: boolWhether key-prefix compression is enabled for this tree’s BINs.
JE DatabaseImpl.getKeyPrefixing() / DatabaseConfig.setKeyPrefixing().
When false, IN.computeKeyPrefix returns null in JE — no prefix
is ever set. Noxu mirrors this: insert_with_prefix is skipped in
favour of insert_raw, and recompute_key_prefix is not called on
BIN halves after a split.
Default: false (matches JE’s DatabaseConfig.KEY_PREFIXING_DEFAULT).
Ref: IN.java computeKeyPrefix ~line 2456.
compact_max_key_length: i32T-5: maximum post-prefix key length (bytes) for the compact key rep
(INKeyRep.MaxKeySize). A node packs all its keys into one fixed-width
byte array when every post-prefix key is <= this length; a longer key
inflates the node to the Default rep. <= 0 disables the compact
rep entirely.
Default 16 (TREE_COMPACT_MAX_KEY_LENGTH /
INKeyRep.MaxKeySize.DEFAULT_MAX_KEY_LENGTH). Wired from
EnvironmentConfig via Tree::set_compact_max_key_length
(IN.getCompactMaxKeyLength, IN.java:4929).
Implementations§
Source§impl Tree
impl Tree
Sourcepub fn new(database_id: u64, max_entries_per_node: usize) -> Self
pub fn new(database_id: u64, max_entries_per_node: usize) -> Self
Creates a new empty tree.
Constructor.
Sourcepub fn set_memory_counter(&mut self, counter: Arc<AtomicI64>)
pub fn set_memory_counter(&mut self, counter: Arc<AtomicI64>)
Installs a shared memory counter for evictor / MemoryBudget feedback.
→ env.getMemoryBudget().updateTreeMemoryUsage(delta)
. The counter is updated on every BIN entry insert/delete.
Sourcepub fn set_in_list_listener(&mut self, listener: Arc<dyn InListListener>)
pub fn set_in_list_listener(&mut self, listener: Arc<dyn InListListener>)
Installs the InListListener (the evictor) so node add/access/remove
feed the LRU lists. JE: INList registration that feeds
Evictor.addBack/moveBack/remove.
Sourcepub fn set_log_manager(&mut self, lm: Arc<LogManager>)
pub fn set_log_manager(&mut self, lm: Arc<LogManager>)
Installs the noxu_log::LogManager so an evicted root IN can be
re-materialized from its persisted LSN on the next access (EV-14).
JE: the tree reaches the log through database.getEnv().getLogManager()
for ChildReference.fetchTarget. Noxu installs it directly.
Sourcepub fn clear_log_manager(&mut self)
pub fn clear_log_manager(&mut self)
Drops this tree’s Arc<LogManager> reference (EV-14 teardown).
The env’s Drop calls this on every tree it owns so the
Tree -> Arc<LogManager> -> Arc<FileManager> chain cannot keep the
FileManager (and its on-disk exclusive lock) alive past environment
close. After this the tree can no longer re-fetch an evicted root
from the log — which is correct, because the environment is shutting
down and the tree is about to be dropped.
Sourcepub fn set_compact_max_key_length(&mut self, len: i32)
pub fn set_compact_max_key_length(&mut self, len: i32)
T-5: set the compact-key threshold (TREE_COMPACT_MAX_KEY_LENGTH /
IN.getCompactMaxKeyLength). New BINs created by this tree inherit it;
<= 0 disables the compact key rep. Default 16.
Sourcepub fn new_with_comparator(
database_id: u64,
max_entries_per_node: usize,
comparator: KeyComparatorFn,
) -> Self
pub fn new_with_comparator( database_id: u64, max_entries_per_node: usize, comparator: KeyComparatorFn, ) -> Self
Creates a new empty tree with a custom key comparator.
Used for sorted-duplicate databases where keys are two-part composite keys that require a custom ordering function.
Constructor with btreeComparator parameter.
Sourcepub fn set_key_prefixing(&mut self, enabled: bool)
pub fn set_key_prefixing(&mut self, enabled: bool)
Sets the key-prefixing flag.
When true, BIN key-prefix compression is enabled: shared leading
bytes are factored out of each slot’s key. When false (the
default), keys are stored verbatim — matching JE
DatabaseConfig.setKeyPrefixing(false) / IN.computeKeyPrefix
returning null.
Ref: IN.java computeKeyPrefix ~line 2456.
Sourcepub fn set_comparator(&mut self, comparator: KeyComparatorFn)
pub fn set_comparator(&mut self, comparator: KeyComparatorFn)
Sets the key comparator, replacing any existing one.
Sourcepub fn hint_redo_capacity(&mut self, capacity: usize)
pub fn hint_redo_capacity(&mut self, capacity: usize)
Store a capacity hint used by redo_insert when it creates the first
BIN for this tree (the first-key path).
The first BIN’s entries Vec is pre-allocated with
capacity.min(max_entries_per_node) slots, eliminating the
Vec-resize doubling cycle (1 → 2 → 4 → … → cap) that would
otherwise occur during the redo loop.
Call once before the redo loop. Has no effect on insert (the
normal, non-recovery path).
Wave 11-K optimisation (Fix 3).
Sourcepub fn get_redo_capacity_hint(&self) -> usize
pub fn get_redo_capacity_hint(&self) -> usize
Returns the current redo capacity hint (0 = no hint set).
Sourcepub fn take_comparator(&mut self) -> Option<KeyComparatorFn>
pub fn take_comparator(&mut self) -> Option<KeyComparatorFn>
Takes the key comparator out of this tree (leaving None).
Sourcepub fn get_comparator(&self) -> Option<&KeyComparatorFn>
pub fn get_comparator(&self) -> Option<&KeyComparatorFn>
Returns a reference to the key comparator, if configured.
Used by CursorImpl::find_bin_for_key (R4 fix) so the cursor’s own
IN-level descent uses the same comparator-aware floor slot as the
tree’s own search paths. Mirrors JE DatabaseImpl.getKeyComparator().
Sourcepub fn set_root(&self, node: TreeNode)
pub fn set_root(&self, node: TreeNode)
Sets the root of the tree.
Must hold root_latch exclusively before calling.
Sourcepub fn get_root(&self) -> Option<Arc<RwLock<TreeNode>>>
pub fn get_root(&self) -> Option<Arc<RwLock<TreeNode>>>
Returns the root Arc, if any.
Returns a cloned Arc rather than a reference so the caller does not
hold the inner RwLock guard.
EV-14: when the in-memory root has been evicted (evict_root) but a
persisted version exists (root_log_lsn set), this re-materializes it
from the log before returning — the faithful equivalent of JE
Tree.getRootIN always calling root.fetchTarget(...). Returns
None only for a genuinely empty tree (no resident root and no
persisted root LSN).
Sourcepub fn get_database_id(&self) -> u64
pub fn get_database_id(&self) -> u64
Returns the database ID.
Sourcepub fn count_entries(&self) -> u64
pub fn count_entries(&self) -> u64
Count the total number of live (non-deleted) entries across all BINs.
Used by DatabaseImpl::set_recovered_tree() to initialise the
per-database entry_count AtomicU64 after recovery replays the log.
Sourcepub fn resort_under_comparator(&self)
pub fn resort_under_comparator(&self)
DBI-14: rebuild this tree so that its on-disk byte-ordered slot layout is re-sorted under the currently-configured key comparator.
Recovery redo (redo_insert) has no access to the application’s
comparator function — only the persisted identity — so it lays keys
out in unsigned-byte order. After set_recovered_tree attaches the
real comparator, the slots must be re-sorted, or comparator-driven
searches would binary-search a tree ordered by the wrong relation.
No-op when no comparator is configured (byte order already matches the recovered layout) or when the tree is empty. Mirrors the effect of JE reconstructing the comparator at open and the tree always having been built under it.
Sourcepub fn total_budgeted_memory(&self) -> u64
pub fn total_budgeted_memory(&self) -> u64
Sum the real in-memory heap footprint of every resident node in the tree (DBI-23 oracle / reconciliation), in bytes.
Walks all resident IN/BIN nodes and adds each node’s
budgeted_memory_size (JE IN.getBudgetedMemorySize). This is the
authoritative “real heap” figure the incrementally-maintained
memory_counter is meant to approximate; an engine can call it to
reconcile counter drift, and the DBI-23 test uses it as the oracle the
live counter must stay within tolerance of.
Sourcepub fn search(&self, key: &[u8]) -> Option<SearchResult>
pub fn search(&self, key: &[u8]) -> Option<SearchResult>
Search for a BIN that should contain the given key.
This is the core tree traversal operation. It walks from root to BIN using latch-coupling (acquire child latch, then release parent latch).
. Descends the tree until a BIN is reached, following the child pointer at the slot whose key is the largest key <= the search key (the “LTE” rule). Slot 0 in every upper IN carries a virtual key (-infinity) so any search key routes through it when all real keys are larger.
Returns a SearchResult indicating where the key is or should be. Returns None if tree is empty.
Sourcepub fn search_with_data(&self, key: &[u8]) -> Option<SlotFetch>
pub fn search_with_data(&self, key: &[u8]) -> Option<SlotFetch>
Combined search-and-fetch: descend once to the BIN and return the slot’s data together with a reference to the BIN arc.
Replaces the previous three-descent sequence on the Database::get
hot path:
Tree::search— existence check only.CursorImpl::get_data_from_tree— re-descended to fetch data.CursorImpl::find_bin_for_key— re-descended for BIN pinning.
One descent now does all three jobs. At the BIN level it uses the
existing binary-search helper find_entry_compressed instead of the
O(n) iter().find() used by get_data_from_tree.
Returns None only when the tree is empty. Otherwise returns
Some(SlotFetch) — callers must inspect SlotFetch::found to
determine whether the key was present. The BIN read-guard is released
before this method returns so callers may safely call lock_ln
(which may block) without holding any tree latch.
Wave-11-I — see the 2026 review.
Sourcepub fn update_key_expiration(&self, key: &[u8], expiration_hours: u32) -> bool
pub fn update_key_expiration(&self, key: &[u8], expiration_hours: u32) -> bool
Sets the expiration time (in absolute hours since Unix epoch) for an existing key’s BIN slot.
Returns true if the key was found and updated, false otherwise.
Used by Database::put_with_options() to apply per-record TTL.
IN.entryExpiration / BIN.expirationInHours path.
Sourcepub fn first_entry_at_or_after(
&self,
key: &[u8],
) -> Option<(Vec<u8>, Vec<u8>, u64)>
pub fn first_entry_at_or_after( &self, key: &[u8], ) -> Option<(Vec<u8>, Vec<u8>, u64)>
Returns the key and data of the first BIN entry at or after key.
Descends with the tree’s key comparator (same path as search()), then
within the BIN finds the first slot whose stored key >= key using the
comparator. Returns None if every entry in the tree is < key.
Used by sorted-duplicate cursor search(Set) to position at the first
(key, data) pair whose two-part key >= lower_bound(primary_key).
→ BIN scan path.
Sourcepub fn first_entry_at_or_after_with_index(
&self,
key: &[u8],
) -> Option<(Vec<u8>, Vec<u8>, usize, u64, Arc<NodeRwLock<TreeNode>>)>
pub fn first_entry_at_or_after_with_index( &self, key: &[u8], ) -> Option<(Vec<u8>, Vec<u8>, usize, u64, Arc<NodeRwLock<TreeNode>>)>
Like Tree::first_entry_at_or_after but also returns the BIN node
(so callers may pin it) and the entry’s slot index inside that
BIN.
Wave 11-N (Bug 2): CursorImpl::search_dup previously stored
current_index = 0 after a sorted-dup Search, which broke the
fast-path of retrieve_next (and the slow path’s
next_index = current_index + 1 arithmetic) for any primary
that was not the first slot of its BIN. This helper hands back
the real index so the cursor can be positioned correctly.
CC-2 fix: uses the same read_arc() hand-over-hand latch coupling
as every other descent method (search, first_entry_at_or_after,
get_first_node, get_adjacent_bin_attempt). The original
implementation did arc.read().is_bin() (lock acquired and released)
then a SECOND arc.read() on the next line — a gap in which a
concurrent split can promote the node (BIN→upper IN) or move the
sought key to a new sibling, yielding a false “not found” for an
existing key. Mirrors JE Tree.searchSubTree / Tree.search
which hold the latch across the is_bin() test and the subsequent
entry lookup.
Sourcepub fn insert(
&self,
key: Vec<u8>,
data: Vec<u8>,
lsn: Lsn,
) -> Result<bool, TreeError>
pub fn insert( &self, key: Vec<u8>, data: Vec<u8>, lsn: Lsn, ) -> Result<bool, TreeError>
Insert a key/data pair into the tree.
. Handles the root-is-null case by
creating a two-level tree (upper IN + BIN) per initialisation path,
then delegates to insert_recursive which performs preemptive splitting
as it descends.
Returns Ok(true) if this was a new insert, Ok(false) if it was an update.
Sourcepub fn redo_insert(
&self,
key: &[u8],
data: &[u8],
lsn: Lsn,
) -> Result<bool, TreeError>
pub fn redo_insert( &self, key: &[u8], data: &[u8], lsn: Lsn, ) -> Result<bool, TreeError>
Recovery-redo variant of Tree::insert that accepts &[u8] slices.
Eliminates the two intermediate Vec<u8> allocations that the normal
insert path requires at the redo_ln call site (one for the key, one
for the data). The compressed key suffix and the data bytes are each
materialised into their BinEntry slots exactly once.
Semantics are identical to insert:
- Updates the existing slot when the key is already present.
- Inserts a new sorted entry when the key is absent.
- Triggers the same root-split and proactive-split logic.
data should be the raw value bytes, or an empty slice for a
deletion (which should not normally arrive here during redo, but is
handled gracefully).
Wave 11-K optimisation (Fix 1).
Sourcepub fn reserve_redo_capacity(&self, n: usize)
pub fn reserve_redo_capacity(&self, n: usize)
Pre-warm the tree’s internal Vec<BinEntry> capacity before a redo
pass that will insert approximately n records.
If the tree is empty, this is a no-op (there is no BIN yet to reserve
capacity on). If the tree already has a root BIN (from a previous
checkpoint), reserves n.min(max_entries_per_node) additional slots
in that BIN’s entries vector, eliminating the resize-double cycle
during the redo loop.
Wave 11-K optimisation (Fix 3).
Sourcepub fn get_first_node(&self) -> Option<SearchResult>
pub fn get_first_node(&self) -> Option<SearchResult>
Get the first (leftmost) BIN in the tree.
Descends to the leftmost BIN by always following the first child slot at each upper IN level.
Sourcepub fn get_last_node(&self) -> Option<SearchResult>
pub fn get_last_node(&self) -> Option<SearchResult>
Get the last (rightmost) BIN in the tree.
Descends to the rightmost BIN by always following the last child slot at each upper IN level.
Sourcepub fn get_root_splits(&self) -> u64
pub fn get_root_splits(&self) -> u64
Returns the number of root splits that have occurred.
Sourcepub fn get_relatches_required(&self) -> u64
pub fn get_relatches_required(&self) -> u64
Returns the number of relatches required.
Sourcepub fn delete(&self, key: &[u8]) -> bool
pub fn delete(&self, key: &[u8]) -> bool
Delete a key from the tree.
Traverses the tree to find the BIN that should contain the key, then removes the entry. Returns true if the key was found and removed.
Delete path in Tree from the.
In-memory removal only — WAL logging for deletes is handled by the
cursor layer (cursor_impl.rs::log_ln_write) before this is called,
matching separation between LN logging and tree mutation.
Sourcepub fn compress(&self)
pub fn compress(&self)
Merge under-full sibling BIN pairs and remove empty subtrees.
INCompressor / Tree.compressInternal() logic.
merges two adjacent siblings when their combined entry count is
≤ max_entries_per_node (the merge threshold equal to the node
capacity). The left sibling’s entries are prepended into the right
sibling; the parent key slot pointing at the left sibling is then
removed from the parent IN with deleteEntry. If the parent IN
becomes empty after the removal the process repeats recursively up
the tree.
This implementation performs a single post-order walk so that each level is compressed after all its children have been compressed.
Sourcepub fn compress_bin(&self, bin_arc: &Arc<RwLock<TreeNode>>) -> bool
pub fn compress_bin(&self, bin_arc: &Arc<RwLock<TreeNode>>) -> bool
Compress deleted slots from a BIN node, then prune it from its parent IN when it becomes empty.
(the in-place slot-removal
path, NOT the sibling-merge path handled by compress()).
§Algorithm
- If the BIN is a delta, skip — deltas cannot be compressed.
- Remove all slots where
entry.known_deletedis true. This mirrorsbin.compress(!bin.shouldLogDelta(), localTracker). - If the BIN is now empty, remove it from its parent IN. This mirrors
pruneBIN(db, binRef, idKey)→tree.delete(idKey).
§Arguments
bin_arc— the BIN to compress (must be aTreeNode::Bottom).
§Returns
true if compression made progress (slots were removed or the BIN was
pruned), false if the BIN was skipped (delta, no cursors issue, etc.).
Sourcepub fn compress_bin_with_lock_check(
&self,
bin_arc: &Arc<RwLock<TreeNode>>,
is_locked: Option<&dyn Fn(u64) -> bool>,
) -> bool
pub fn compress_bin_with_lock_check( &self, bin_arc: &Arc<RwLock<TreeNode>>, is_locked: Option<&dyn Fn(u64) -> bool>, ) -> bool
Like compress_bin, but consults a caller-supplied
is_locked predicate before physically removing each known_deleted
slot. If is_locked(slot_lsn) returns true, the slot is SKIPPED
(left for a later compression pass after the locking txn resolves).
This is the faithful port of JE BIN.compress (BIN.java:1141-1172):
We have to be able to lock the LN before we can compress the entry. If we can’t, then skip over it. … it is more efficient to call
isLockUncontendedthan to actually lock the LN, since we would release the lock immediately.
if (lsn != DbLsn.NULL_LSN &&
!lockManager.isLockUncontended(lsn)) {
anyLocked = true;
continue;
}JE’s isLockUncontended(lsn) (LockManager.java:692) returns
nWaiters() == 0 && nOwners() == 0. Our is_locked(lsn) is its
inverse: the dbi layer supplies a closure over the LockManager that
returns true iff the slot’s LSN has any owner or waiter
(LockManager::get_lock_info(lsn) != (0, 0)). A NULL_LSN slot is
always discardable without locking (JE: “Can discard a NULL_LSN entry
without locking”), so we never invoke the predicate for it.
§Layering (noxu-tree -/-> noxu-txn)
The predicate is a &dyn Fn(u64) -> bool, NOT a LockManager
reference, so noxu-tree never depends on noxu-txn. The lock knowledge
lives entirely in the dbi-supplied closure.
§Lock ordering (no deadlock)
is_locked is invoked while this method holds the BIN write latch.
The dbi closure calls LockManager::get_lock_info, which takes a
lock-table shard mutex for a single, non-blocking critical section
and releases it before returning — it never waits and never re-enters
the tree. The LockManager has no edge back into a BIN latch (lock
acquisition descends the tree from the dbi/cursor layer, never the
reverse). The only ordering is therefore BIN-latch -> shard-mutex,
which is acyclic; no lock cycle exists, so the predicate cannot
deadlock against the latch.
When is_locked is None (recovery, BIN-delta replay, unit tests with
no lock manager) behavior is identical to the historical
compress_bin: every known_deleted slot is removed.
Sourcepub fn prune_empty_bin(&self, id_key: &[u8]) -> bool
pub fn prune_empty_bin(&self, id_key: &[u8]) -> bool
Re-descend to the leaf BIN that should contain id_key and remove its
parent-IN child slot ONLY IF the BIN is still safe to prune.
This is the faithful port of JE Tree.delete(idKey) /
Tree.searchDeletableSubTree (Tree.java ~line 755-800) as invoked by
INCompressor.pruneBIN (INCompressor.java ~line 502-510). JE takes the
branch-parent latch, re-descends to the specific empty BIN, and aborts
the prune (removing NOTHING) if any of the following changed since the
compressor observed the BIN as empty:
bin.getNEntries() != 0→NodeNotEmptyException(a concurrent insert repopulated the BIN — IC-1: we must NOT delete a live entry).bin.isBINDelta()→unexpectedState(deltas are never empty).bin.nCursors() > 0→CursorsExistException(a cursor is parked on the empty BIN; requeue rather than orphan the cursor).
The re-check and the slot removal both happen while holding the
parent IN write latch. Holding the parent write latch blocks every
descender (insert / delete take parent.read() hand-over-hand), so a
concurrent insert cannot reach the BIN between our re-check and the
slot removal — the TOCTOU window IC-1 describes is closed.
Returns true iff a parent-IN slot was removed, false otherwise
(BIN repopulated, has a cursor, is a delta, vanished, or is the root —
in every false case NOTHING is removed).
Sourcepub fn detach_node_by_id(&self, node_id: u64) -> u64
pub fn detach_node_by_id(&self, node_id: u64) -> u64
Detach the resident child node node_id from its parent IN, dropping
the strong Arc so the node is actually freed from memory, and return
the heap bytes reclaimed (0 if not found / not detachable).
This is the faithful port of JE IN.detachNode(idx, updateLsn, newLsn)
(IN.java ~4019) as called from Evictor.evict (Evictor.java ~3035):
evict measures target.getBudgetedMemorySize() and then
parent.detachNode(index, ...) does setTarget(idx, null) to drop the
child reference and getInMemoryINs().remove(child) to drop it from
the INList.
EV-13: before this method existed, the evictor credited
node_size_fn(node_id) bytes back to the budget and removed the node
from the LRU lists, but the parent’s InEntry.child still held a
strong Arc — so the node was never dropped from the heap. The budget
over-credited (claimed bytes freed that were not), cache_usage
drifted below reality, and the evictor under-fired. Detaching here
drops the Arc for real and credits exactly the measured size.
The detach happens under the parent IN write latch (JE detaches
under the parent’s latch), so no concurrent descender can re-cache the
child between measurement and detach. The slot (key + LSN) is kept —
only the in-memory child target is cleared — matching JE’s
setTarget(idx, null) which leaves the ChildReference LSN intact so
the node can be re-fetched from the log later.
Returns 0 if the node is not a resident child of any IN (e.g. it is
the root, already detached, or was pinned and could not be latched).
Sourcepub fn evict_root(&self, db_id: u64) -> Option<(u64, bool)>
pub fn evict_root(&self, db_id: u64) -> Option<(u64, bool)>
Evict the root IN of this tree (EV-14).
Faithful port of JE Evictor.evictRoot (Evictor.java:3050-3110) plus
the RootEvictor.doWork + Tree.withRootLatchedExclusive framing
(Evictor.java:2529-2576, Tree.java:508-517). Unlike a normal IN, the
root has no parent slot to detach from; instead the tree’s root
reference is the equivalent of the RootChildReference, so eviction:
- Latches the root reference exclusively (
rootLatch.acquireExclusiveviawithRootLatchedExclusive). - Re-checks that the root is still resident and still evictable
(no resident children, no pinned BIN — JE
RootEvictor.doWorkre-latches and re-checksrootIN == target && rootIN.isRoot()). - If the root is dirty, LOGS it first so the on-disk version is
current and updates
root_log_lsnto the new LSN (JEevictRoot:long newLsn = target.log(...); rootRef.setLsn(newLsn)). - Clears the in-memory root (
rootRef.clearTarget()— JE leaves theChildReferenceLSN intact; hereroot_log_lsnis that LSN) andnote_removeds it from the evictor LRU (JEinList.remove(target)).
On the next access fetch_root_from_log re-materializes the root from
root_log_lsn (JE Tree.getRootINRootAlreadyLatched →
root.fetchTarget).
§Conditions (eviction is REFUSED, returning None, when)
- there is no log manager wired (the root could never be re-fetched),
- the tree has no resident root (already evicted),
- the root has any resident child (JE only evicts a childless root —
the
hasCachedChildrenskip inprocessTarget; a root with cached children would orphan them, the EV-6 invariant), - the root is a BIN pinned by a cursor (
cursor_count > 0), - the root is dirty but we have no clean persisted version AND logging it fails, or
- the root is clean but
root_log_lsnis null (never logged — cannot be re-fetched; happens only for a brand-new unlogged tree).
Returns Some((freed_bytes, was_dirty)) on success, where freed_bytes
is the root’s measured heap footprint (JE
target.getBudgetedMemorySize()) and was_dirty reports whether the
root had to be logged (JE rootEvictor.flushed, which drives
nDirtyNodesEvicted and modifyDbRoot).
Sourcepub fn fetch_root_from_log(&self) -> Option<Arc<RwLock<TreeNode>>>
pub fn fetch_root_from_log(&self) -> Option<Arc<RwLock<TreeNode>>>
Re-materialize an evicted root IN from its persisted root_log_lsn
(EV-14, piece B).
Faithful to JE Tree.getRootINRootAlreadyLatched (Tree.java:477-516)
which calls root.fetchTarget(database, null) when the in-memory
target is null. Idempotent and cheap when the root is already
resident: returns the resident root without touching the log.
Returns None only when the tree is genuinely empty (no resident root
AND root_log_lsn is null) or when the re-fetch fails (no log manager,
log read error, deserialize failure) — callers then see an empty tree,
never wrong data.
Sourcepub fn maybe_compress_bin_and_parent(
&self,
bin_arc: &Arc<RwLock<TreeNode>>,
) -> bool
pub fn maybe_compress_bin_and_parent( &self, bin_arc: &Arc<RwLock<TreeNode>>, ) -> bool
Check whether a BIN node is a candidate for slot compression and,
if so, trigger compress_bin.
from (the opportunistic / lazy compression path).
§Algorithm
- Skip the BIN if it is a delta or has no defunct (known-deleted) slots.
- If compression succeeds and the BIN becomes empty, it is pruned.
§Returns
true if compression was triggered (regardless of whether any slots
were actually removed), false if the BIN does not need compression.
Sourcepub fn validate_parent_child(
parent: &Arc<RwLock<TreeNode>>,
child_index: usize,
child_arc: &Arc<RwLock<TreeNode>>,
) -> bool
pub fn validate_parent_child( parent: &Arc<RwLock<TreeNode>>, child_index: usize, child_arc: &Arc<RwLock<TreeNode>>, ) -> bool
Validate that parent.entries[child_index].child still points at
child_arc after acquiring the child’s latch.
Re-latch validation step inside the
Tree.searchSplitsAllowed: after a concurrent split the parent
slot that previously held the child may have changed. Callers that
plan to mutate the child must verify the parent-child link is still
intact before proceeding.
Returns true if the parent-child link is intact.
Sourcepub fn search_with_coupling(&self, key: &[u8]) -> Option<SearchResult>
pub fn search_with_coupling(&self, key: &[u8]) -> Option<SearchResult>
Search for the BIN that should contain key, with latch-coupling
validation at every level of descent.
.
The difference from search() is that after obtaining the child
arc we call validate_parent_child to confirm the parent still
holds the expected Arc. If the link has been broken (e.g. by a
concurrent split that relocated the child) the traversal restarts
from the root.
Returns a SearchResult if the key is (or should be) in the tree,
None if the tree is empty.
Same as Tree::search but exposes the hand-over-hand latch
coupling explicitly. Kept as a public, equivalent API for
callers (today only tests) that want to verify the
latch-coupling behaviour against search() itself.
Both search() and this method use the same read_arc()
hand-over-hand: take the child read guard before dropping
the parent guard, so a concurrent split_child(parent, ..)
(which takes parent.write()) cannot run between when we
captured the child Arc and when we entered the child. There
is no validate-and-restart loop because the coupling makes
the race unreachable.
Sourcepub fn pin_bin(bin_arc: &Arc<RwLock<TreeNode>>)
pub fn pin_bin(bin_arc: &Arc<RwLock<TreeNode>>)
Increments the cursor-pin count on a BIN node.
Called by CursorImpl when it positions on (or enters) a BIN.
The evictor will not select a BIN with cursor_count > 0 for eviction
(RealNodeInfo.pin_count), matching BIN.incrementCursorCount().
Sourcepub fn unpin_bin(bin_arc: &Arc<RwLock<TreeNode>>)
pub fn unpin_bin(bin_arc: &Arc<RwLock<TreeNode>>)
Decrements the cursor-pin count on a BIN node.
Called by CursorImpl when it moves away from or closes on a BIN.
Uses saturating_sub to guard against an accidental double-unpin.
Matching BIN.decrementCursorCount().
Sourcepub fn bin_is_delta(bin: &BinStub) -> bool
pub fn bin_is_delta(bin: &BinStub) -> bool
Returns true if the given BinStub is a BIN-delta (not a full BIN).
IN.isBINDelta().
Sourcepub fn apply_delta_to_bin(
bin: &mut BinStub,
delta_entries: Vec<(Vec<u8>, Lsn, Option<Vec<u8>>)>,
)
pub fn apply_delta_to_bin( bin: &mut BinStub, delta_entries: Vec<(Vec<u8>, Lsn, Option<Vec<u8>>)>, )
Merge delta entries into a full BIN’s entry list.
- For each delta entry: if a matching key already exists in
bin, replace it (delta is authoritative). - Otherwise insert the delta entry in sorted position.
Delta entries carry full keys (prefix already prepended by the caller). After applying all delta entries the BIN’s prefix is recomputed so the final state is consistent.
All delta entries are considered to be the most-recently-dirtied state, exactly as in where delta slots supersede full-BIN slots.
Sourcepub fn mutate_to_full_bin(delta: &mut BinStub, base: BinStub)
pub fn mutate_to_full_bin(delta: &mut BinStub, base: BinStub)
Reconstitute a BIN-delta into a full BIN.
from the:
- Extract the delta entries from
self(this BIN-delta), decompressing them to full keys. - Apply them onto
base(the previously logged full BIN) viaapply_delta_to_bin. - Copy
base’s merged entries and prefix back intoself. - Clear the
is_deltaflag so subsequent code treatsselfas a full BIN.
After this call self is a full BIN; base should be discarded.
Sourcepub fn mutate_to_full_bin_from_log(
delta: &mut BinStub,
log_manager: &LogManager,
)
pub fn mutate_to_full_bin_from_log( delta: &mut BinStub, log_manager: &LogManager, )
Reconstitute a BIN-delta into a full BIN by reading the base from log.
— the
single-argument overload that calls fetchFullBIN(databaseImpl) to
read the last full BIN from the log manager automatically.
Algorithm:
- If
delta.last_full_lsn == NULL_LSN, the BIN was never written as a full entry; there is no base to merge so the delta IS the full BIN. Clearis_deltaand return. - Read the full-BIN log entry at
delta.last_full_lsnusinglog_manager.read_entry(lsn). - Deserialize the payload with
BinStub::deserialize_full(). - Delegate to
Self::mutate_to_full_bin(delta, base)to merge and replacedelta’s contents.
On any read / parse failure the function falls back to clearing the
is_delta flag without merging, so the caller always gets a non-delta
BIN (possibly missing some old slots). This mirrors the
EnvironmentFailureException path but gracefully degrades instead of
panicking.
BIN.fetchFullBIN(dbImpl) + BIN.mutateToFullBIN(boolean).
Sourcepub fn get_next_bin(
&self,
current_key: &[u8],
) -> Option<Vec<(BinEntry, Lsn, Vec<u8>)>>
pub fn get_next_bin( &self, current_key: &[u8], ) -> Option<Vec<(BinEntry, Lsn, Vec<u8>)>>
Return the entries of the BIN immediately to the right of the BIN
that contains (or would contain) current_key.
→ Tree.getNextIN(forward=true).
§Algorithm
- Build a root-to-BIN path for
current_key. - Walk the path back up looking for a parent that has a slot to the right of the slot we descended through.
- When found, descend to the leftmost BIN of that sibling subtree.
- If no such parent exists, return
None(no next BIN).
Source§impl Tree
impl Tree
Sourcepub fn collect_stats(&self) -> TreeStats
pub fn collect_stats(&self) -> TreeStats
Walks the entire tree and collects structural statistics.
TreeWalkerStatsAccumulator pattern — performs a simple
recursive DFS and counts INs, BINs, entries, and tree height.
Sourcepub fn collect_dirty_bins(
&self,
db_id: u64,
) -> Vec<(u64, Arc<RwLock<TreeNode>>)>
pub fn collect_dirty_bins( &self, db_id: u64, ) -> Vec<(u64, Arc<RwLock<TreeNode>>)>
Collects all dirty BINs as (Arc to node, db_id) pairs.
The checkpoint path calls this to enumerate BINs that need to be
logged. For each dirty BIN the checkpoint decides — based on the
BIN-delta threshold — whether to write a full BIN entry or a
BINDelta entry.
Checkpointer.processINList() which iterates the dirty
IN list accumulated during normal operation.
Sourcepub fn collect_bins_with_known_deleted(&self) -> Vec<Arc<RwLock<TreeNode>>>
pub fn collect_bins_with_known_deleted(&self) -> Vec<Arc<RwLock<TreeNode>>>
Collect all BINs that have at least one known_deleted slot.
INCompressor queue-drain scan in the: the daemon iterates
the in-memory IN list and identifies BINs that still hold zombie deleted
slots. Each returned Arc can be passed directly to compress_bin().
Sourcepub fn serialize_upper_in(&self, node_id: u64) -> Option<Vec<u8>>
pub fn serialize_upper_in(&self, node_id: u64) -> Option<Vec<u8>>
Collect all dirty upper (non-BIN) internal nodes, sorted ascending by level (bottom-up order, BIN level excluded).
Serialise an upper-IN node (level > 1) by node_id for off-heap storage.
Traverses the tree to find the internal node whose matches, then calls to produce a compact byte representation. Returns if the node is not found or is a BIN (BINs are not upper INs).
Mirrors OffHeapAllocator serialises the same bytes that would be written
to the log, allowing the evictor to store upper-INs off-heap and avoid
log-file reads on the next traversal.
Sourcepub fn collect_dirty_upper_ins(
&self,
_db_id: u64,
) -> Vec<(i32, Arc<RwLock<TreeNode>>)>
pub fn collect_dirty_upper_ins( &self, _db_id: u64, ) -> Vec<(i32, Arc<RwLock<TreeNode>>)>
Upper-IN traversal in Checkpointer.processINList() from
— visits all TreeNode::Internal nodes whose dirty flag is set
and returns them together with their level, sorted lowest-level-first
so the checkpointer can log them bottom-up. The root is always the
last entry (highest level), which must be logged Provisional::No.
Sourcepub fn is_root_resident(&self) -> bool
pub fn is_root_resident(&self) -> bool
Returns true if the root node is currently loaded in memory.
.
Sourcepub fn get_resident_root_in(&self) -> Option<Arc<RwLock<TreeNode>>>
pub fn get_resident_root_in(&self) -> Option<Arc<RwLock<TreeNode>>>
Returns the root node Arc if present, or None.
.
Sourcepub fn get_parent_bin_for_child_ln(
&self,
key: &[u8],
) -> Option<Arc<RwLock<TreeNode>>>
pub fn get_parent_bin_for_child_ln( &self, key: &[u8], ) -> Option<Arc<RwLock<TreeNode>>>
Returns the BIN that should contain a slot for key (the “parent” of
LN slots).
. Descends the tree
exactly like search() and returns the leaf-level BIN arc, or None
if the tree is empty.
Uses read_arc() hand-over-hand on the descent — the child
guard is taken before the parent guard is dropped, matching
search(). Returns the BIN Arc with no read lock held; the
caller must take whatever lock it needs to operate on the
returned BIN.
Sourcepub fn find_bin_for_insert(&self, key: &[u8]) -> Option<Arc<RwLock<TreeNode>>>
pub fn find_bin_for_insert(&self, key: &[u8]) -> Option<Arc<RwLock<TreeNode>>>
Returns the BIN where key should be inserted.
. Semantically identical to
get_parent_bin_for_child_ln — expressed as a separate method to match
API surface.
Implemented as a delegation to get_parent_bin_for_child_ln,
which uses read_arc() hand-over-hand on the descent.
Sourcepub fn search_splits_allowed(&self, key: &[u8]) -> Option<SearchResult>
pub fn search_splits_allowed(&self, key: &[u8]) -> Option<SearchResult>
Search for a BIN, allowing splits during descent (preemptive splitting).
. This thin wrapper
delegates to search() and returns the result wrapped in Some.
The full split-allowed descent is performed by insert() internally;
this method exposes the same result type for callers that only need to
locate the BIN.
Returns None if the tree is empty.
Sourcepub fn rebuild_in_list(&self) -> Vec<Arc<RwLock<TreeNode>>>
pub fn rebuild_in_list(&self) -> Vec<Arc<RwLock<TreeNode>>>
Traverses the entire tree and returns every IN and BIN node as a flat list.
. Used by recovery to rebuild
the in-memory IN list after log replay. The walk is a BFS from the
root; every Arc<RwLock<TreeNode>> encountered (both Internal and
Bottom variants) is included in the result.
Sourcepub fn validate_in_list(&self) -> bool
pub fn validate_in_list(&self) -> bool
Validates internal tree consistency.
. Primarily a debug/test tool.
Rules checked:
- An empty tree (no root) is trivially valid → returns
true. - A non-empty tree must have a non-null root.
- Every Internal node must have at least one entry.
- Every child pointer that is
Somemust be readable (lock must be acquirable — i.e., no poisoned locks).
Returns true if no inconsistencies are detected, false otherwise.
Sourcepub fn get_parent_in_for_child_in(
&self,
child_node_id: u64,
) -> Option<(Arc<RwLock<TreeNode>>, usize)>
pub fn get_parent_in_for_child_in( &self, child_node_id: u64, ) -> Option<(Arc<RwLock<TreeNode>>, usize)>
Traverses the tree to find the parent IN that contains child_node_id
as one of its child slots.
. Used by the cleaner migration path to re-insert migrated INs after eviction/fetch.
Returns (parent_arc, slot_index) where slot_index is the position
in the parent’s entries vector whose child matches child_node_id,
or None if no such parent is found.
Sourcepub fn reconstitute_bin_delta(
base_bytes: &[u8],
delta_bytes: &[u8],
) -> Option<BinStub>
pub fn reconstitute_bin_delta( base_bytes: &[u8], delta_bytes: &[u8], ) -> Option<BinStub>
Propagates the dirty flag upward from node_arc to the root.
Implicit dirty propagation: after modifying any node, all ancestors on the path to the root must also be marked dirty so the checkpointer logs them.
In this happens through IN.setDirty(true) calls at each level
during split/insert callbacks. Here we walk the weak parent chain.
Reconstitute a BIN-delta by merging it onto a base full BIN.
Implements JE BINDelta.reconstituteBIN(databaseImpl) for the recovery
path where the log manager is not available as a LogManager but as
raw serialized bytes.
Algorithm:
- Deserialise
base_bytesas a fullBinStub. - Apply
delta_bytesslots onto the base usingBinStub::apply_delta(raw slot overlay). - Recompute key prefix so prefix-compressed entries are consistent.
Returns None if either byte slice is malformed.
JE BINDelta.reconstituteBIN / BINDelta.applyDelta
(DRIFT-10 / Stage 3).
pub fn propagate_dirty_to_root(node_arc: &Arc<RwLock<TreeNode>>)
Sourcepub fn deserialize_upper_in(bytes: &[u8]) -> Option<InNodeStub>
pub fn deserialize_upper_in(bytes: &[u8]) -> Option<InNodeStub>
Deserialise an upper-IN node from bytes produced by
TreeNode::write_to_bytes() / flush_one_tree_upper_ins.
Format: node_id(u64BE) | level(i32BE) | n_entries(u32BE) | dirty(u8) | per-entry: key_len(u16BE) | key | lsn(u64BE)
JE INFileReader.getIN(db) / IN.readFromLog.
Sourcepub fn deserialize_bin(bytes: &[u8]) -> Option<BinStub>
pub fn deserialize_bin(bytes: &[u8]) -> Option<BinStub>
Deserialise a BIN from bytes produced by BinStub::serialize_full().
Thin wrapper so the recovery path does not need to import BinStub
directly from callers that only have the raw bytes.
JE INFileReader.getIN(db) for a BIN entry.
Sourcepub fn recover_in_redo(
&self,
log_lsn: Lsn,
is_root: bool,
is_bin: bool,
node_data: &[u8],
) -> InRedoResult
pub fn recover_in_redo( &self, log_lsn: Lsn, is_root: bool, is_bin: bool, node_data: &[u8], ) -> InRedoResult
Apply a logged IN/BIN to the in-memory tree during the recovery redo pass.
Implements JE RecoveryManager.recoverIN:
is_rootnodes are handled byrecover_root_in.- non-root nodes are handled by
recover_child_in.
log_lsn is the LSN at which this IN/BIN was logged. The currency
check in recover_child_in uses this to decide whether to replace the
in-memory slot (tree slot LSN < log_lsn → replace; equal → noop;
greater → skip).
JE RecoveryManager.recoverIN / replayOneIN
(RecoveryManager.java ~lines 1200–1280).