Skip to main content

Module sweeper

Module sweeper 

Source
Expand description

Bounded Blob Cache sweeper — admin maintenance for L1 expirations and L2 orphan-chain reclamation.

§Issue #148 — Blob Cache admin maintenance

This module provides the bounded sweeper primitives that admin endpoints, the runtime maintenance scheduler, and the backup hook will call into. The actual HTTP wiring, runtime schedule, and backup integration are tracked as follow-up orchestrator-batch edits (see “FLAGGED HOOKUPS” at the bottom of this file).

§Public surface

§Bounding

All three operations are bounded by SweepLimit:

  • Entries(N) — hard cap on the number of entries scanned.
  • Millis(N) — hard cap on wall-clock time. Checked at every iteration so the cap is honored within a few microseconds of overrun.
  • Either { entries, millis } — first cap to fire wins.

When a sweep terminates because it hit a limit (instead of running to completion) the report’s truncated_due_to_limit flag is set so admin callers can decide whether to schedule a follow-up sweep.

§Concurrency contract

All three operations are safe to call while concurrent readers (BlobCache::get, BlobCache::exists) and writers (BlobCache::put) are in flight:

  • The sweeper only uses BlobCache’s public, &self API (invalidate_key, invalidate_namespace, stats). Those methods take shard-level locks for the briefest possible critical sections; readers touching other shards are never blocked.
  • flush_namespace only bumps a generation counter under a brief write-lock. Concurrent reads against the same namespace either see the old generation (returning a hit if the entry is still alive) or the new generation (treating any cached entry as stale). Either is correct.
  • sweep_expired and reclaim_orphans cooperate with normal traffic by bounding their per-call work and yielding back to the caller. They never hold a global lock across the entire sweep.

The concurrent_reads_never_block_during_sweep property test below verifies the contract empirically: 8 reader threads + 1 sweeper thread, readers must complete within a tight time budget.

§FLAGGED HOOKUPS (orchestrator-batch — not landed by this file)

Marked // FLAG: throughout; collected here for the orchestrator:

  1. mod.rs registrationpub mod sweeper; line in crates/reddb-server/src/storage/cache/mod.rs, plus a pub use sweeper::{BlobCacheSweeper, SweepLimit, SweepReport, OrphanReport, NamespaceFlushReport, NamespaceSweepStats}; re-export so callers can reach the type without the long path.

  2. BlobCache accessor extensions — to walk L1 entries and L2 records the sweeper needs read-only iterators on BlobCache. Today neither surface exists, so sweep_expired and reclaim_orphans are bounded scaffolding that report zero work until those accessors land. Required additions (in cache/blob/cache.rs):

    pub fn for_each_l1_entry<F>(&self, visit: F)
    where F: FnMut(&str /*namespace*/, &str /*key*/, L1EntryView<'_>);
    
    pub fn for_each_l2_record<F>(&self, visit: F)
    where F: FnMut(L2RecordView<'_>);
    
    pub fn l2_orphan_chains(&self) -> impl Iterator<Item = u32 /*root_page*/>;

    The L1EntryView projection should expose expires_at_unix_ms, namespace_generation, and size. The L2RecordView should expose namespace, key, root_page, byte_len. With those, the bodies of sweep_expired and reclaim_orphans become straightforward (sketches inline below).

  3. Backup integrationruntime/backup.rs (or the equivalent backup-orchestrator module) needs an include_blob_cache: bool flag and matching dump/restore round-trip for the L2 metadata B+ tree and blob chains. The sweeper plays no part in backup itself, but the spec in docs/adr/0006-tiered-blob-cache.md ties them together: a backup triggered while a sweep is in flight must observe a consistent L2 snapshot.

  4. Admin HTTP handlerPOST /admin/blob_cache/sweep and POST /admin/blob_cache/flush_namespace endpoints (likely under crates/reddb-server/src/http/admin/), parsing a JSON body matching SweepLimit / namespace name and returning the report struct as JSON. Both stay flagged for follow-up per the issue.

  5. Runtime config knob — default SweepLimit for background-scheduled sweeps + a sweep_on_startup: bool option in the server config struct. The runtime scheduler then calls BlobCacheSweeper::sweep_expired periodically.

Structs§

BlobCacheSweeper
Stateless namespace for sweeper operations against a BlobCache.
NamespaceFlushReport
Outcome of BlobCacheSweeper::flush_namespace.
NamespaceSweepStats
Per-namespace breakdown of a SweepReport.
OrphanReport
Outcome of BlobCacheSweeper::reclaim_orphans.
SweepReport
Outcome of BlobCacheSweeper::sweep_expired.

Enums§

SweepLimit
Bound for a single sweeper invocation.