Skip to main content

RecoveryHandler

Type Alias RecoveryHandler 

Source
pub type RecoveryHandler = Box<dyn FnMut() -> Vec<u8> + Send>;
Expand description

Recovery hook for replication / fork / standby groups whose slots got marked unhealthy after a placement failure.

on_node_failure* paths in all three group types mark_unhealthy the affected slot BEFORE attempting placement so traffic stops routing to a dead node immediately. On a placement failure (no healthy candidate at the moment) the slot stays unhealthy with the dead node’s origin_hash in the registry. The group’s per-node on_node_recovery only re-marks the slot healthy when the recovered node id matches the FAILED node id — recovery of a DIFFERENT spare node (which arrives later and could host the slot) silently never retries placement.

Groups that opt in implement this trait and register themselves with the meshos runtime’s recovery registry; the loop’s reconcile tick walks every registered group, checks has_unhealthy_slots, and calls try_recover with the live scheduler. The cap per tick lets a pathological “every slot unhealthy” state make progress without wedging the loop. Type-erased per-tick recovery handler. Holds a closure that captures everything try_recover needs (the group itself, a scheduler clone, a daemon-registry clone, and the daemon-factory closure). Returns the slot indices the closure successfully recovered this tick.

Box<dyn FnMut + Send> rather than the trait directly so the registry can store heterogeneous group types (StandbyGroup, ForkGroup, ReplicaGroup, …) in one collection. Each caller constructs a handler like:

let group = Arc::new(parking_lot::Mutex::new(my_standby_group));
let scheduler = scheduler.clone();
let registry = daemon_registry.clone();
runtime.recovery_registry().register(Box::new(move || {
    group.lock().try_recover(
        &scheduler,
        &registry,
        &|| Box::new(MyDaemon::new()),
    )
}));

Aliased Type§

pub struct RecoveryHandler(/* private fields */);