Expand description
Standardized control-loop metrics.
Every periodic loop in the cluster (Principle 2.4) exposes the same four observations:
{loop_name}_iterations_total— counter, incremented at the end of every tick (success or failure).{loop_name}_last_iteration_duration_seconds— gauge, wall-time of the most recent tick.{loop_name}_errors_total{kind}— counter keyed by error kind.{loop_name}_up— gauge (0/1), set by the loop’s lifecycle owner when the driver task spawns/exits.
Loop-specific gauges (raft_tick_loop_pending_groups,
health_loop_suspect_peers{peer_id}, etc.) are rendered by the
Prometheus route directly from the owning subsystem — they are
not part of this primitive because their sources are not
uniform.
§Usage
A driver owns an Arc<LoopMetrics> and registers it with a
cluster-scoped LoopMetricsRegistry on spawn. Inside the tick
body:
ⓘ
let t = Instant::now();
match self.sweep().await {
Ok(()) => {}
Err(e) => self.metrics.record_error(e.kind_label()),
}
self.metrics.observe(t.elapsed());On spawn: metrics.set_up(true). On graceful shutdown:
metrics.set_up(false).
Structs§
- Loop
Metrics - Standardized per-loop observations.
- Loop
Metrics Registry - Collection of
LoopMetricshandles so a single Prometheus render pass can iterate every registered loop.