orbit-metrics

orbit-metrics provides metrics snapshot collection on top of orbit-rs rings.

It is for periodic runtime measurements where readers usually want the newest valid sample, not a replayable history.

The intended shape is a multi-worker runtime where workers should not each expose their own metrics endpoint. Workers publish compact snapshots into Orbit; an aggregator folds fresh samples into one visible view.

Model

Hot paths should update process-local counters, gauges, atomics, or small structs. A background task captures those values into a compact snapshot and publishes the snapshot into an Orbit ring.

Collectors walk the ring backwards and keep the newest decodable sample per node or per logical metric key.

local counters/gauges
  -> compact snapshot
  -> OrbitMetricPublisher<T>
  -> orbit-rs ring for T::KIND
  -> OrbitMetricCollector<T>
  -> latest_by_node() / latest_by_key()

This keeps the hot path local and cheap. Orbit is only touched by the publisher task that captures a bounded snapshot.

Core Types

OrbitMetricSnapshot
  Trait implemented by a compact metrics record.

OrbitMetricKeyedSnapshot
  Optional trait for row-like metric families with a logical key.

OrbitMetricFamily<T>
  Typed metrics family bound to an orbit-rs Fleet.

OrbitMetricPublisher<T>
  Writes encoded snapshots into the ring.

OrbitMetricCollector<T>
  Reads newest valid samples from the ring.

OrbitMetricSample<T>
  Decoded snapshot plus the NetId64 that carried it.

Semantics

newest valid sample per node wins;
newest valid sample per metric key wins for keyed families;
malformed old frames are ignored by collectors;
stale samples can be filtered by timestamp;
aggregation is left to the caller.

This crate does not sum counters, render Prometheus output, choose a serialization format, supervise workers, or define application policy.

Aggregation Pattern

Use one metric family per transport shape.

Fixed-width scalar families can usually be encoded as one compact snapshot per worker. Dynamic or labeled rows should use a keyed metric family so collectors can keep the newest row per key.

multi-worker mode:
  worker local state
    -> OrbitMetricPublisher<T>
    -> aggregator collector
    -> merged output

standalone mode:
  local state
    -> local output

The point is not to make metrics durable. The point is to expose current runtime state without requiring external systems to scrape every worker process.

Example Shape

use orbit_metrics::OrbitTyped;
use orbit_metrics::OrbitMetricSnapshot;

struct WorkerSnapshot {
    node: u16,
    captured_at: u64,
    requests: u64,
}

impl OrbitTyped for WorkerSnapshot {
    const KIND: u8 = 42;
}

impl OrbitMetricSnapshot for WorkerSnapshot {
    const FAMILY: &'static str = "worker";

    fn node_id(&self) -> u16 {
        self.node
    }

    fn captured_at_unix_secs(&self) -> u64 {
        self.captured_at
    }

    fn encode(&self) -> Result<Vec<u8>, String> {
        let mut out = Vec::with_capacity(18);
        out.extend_from_slice(&self.node.to_le_bytes());
        out.extend_from_slice(&self.captured_at.to_le_bytes());
        out.extend_from_slice(&self.requests.to_le_bytes());
        Ok(out)
    }

    fn decode(bytes: &[u8]) -> Result<Self, String> {
        if bytes.len() != 18 {
            return Err("invalid worker snapshot length".into());
        }

        Ok(Self {
            node: u16::from_le_bytes(bytes[0..2].try_into().unwrap()),
            captured_at: u64::from_le_bytes(bytes[2..10].try_into().unwrap()),
            requests: u64::from_le_bytes(bytes[10..18].try_into().unwrap()),
        })
    }
}

Use Cases

worker health snapshots;
runtime gauges;
aggregation inputs where stale data should be ignored;
dashboards that need the latest row per node or key;
compact operational metrics shared across sibling processes.