Module loop_metrics

Expand description

Standardized control-loop metrics.

Every periodic loop in the cluster (Principle 2.4) exposes the same four observations:

{loop_name}_iterations_total — counter, incremented at the end of every tick (success or failure).
{loop_name}_last_iteration_duration_seconds — gauge, wall-time of the most recent tick.
{loop_name}_errors_total{kind} — counter keyed by error kind.
{loop_name}_up — gauge (0/1), set by the loop’s lifecycle owner when the driver task spawns/exits.

Loop-specific gauges (raft_tick_loop_pending_groups, health_loop_suspect_peers{peer_id}, etc.) are rendered by the Prometheus route directly from the owning subsystem — they are not part of this primitive because their sources are not uniform.

§Usage

A driver owns an Arc<LoopMetrics> and registers it with a cluster-scoped LoopMetricsRegistry on spawn. Inside the tick body:

let t = Instant::now();
match self.sweep().await {
    Ok(()) => {}
    Err(e) => self.metrics.record_error(e.kind_label()),
}
self.metrics.observe(t.elapsed());

On spawn: metrics.set_up(true). On graceful shutdown: metrics.set_up(false).

Structs§

LoopMetrics: Standardized per-loop observations.
LoopMetricsRegistry: Collection of LoopMetrics handles so a single Prometheus render pass can iterate every registered loop.