orbit-metrics 0.1.0

Metrics snapshot families over orbit-rs rings.
Documentation
# orbit-metrics

`orbit-metrics` provides metrics snapshot collection on top of
`orbit-rs` rings.

It is for periodic runtime measurements where readers usually want the
newest valid sample, not a replayable history.

The intended shape is a multi-worker runtime where workers should not
each expose their own metrics endpoint. Workers publish compact snapshots
into Orbit; an aggregator folds fresh samples into one visible view.

## Model

Hot paths should update process-local counters, gauges, atomics, or
small structs. A background task captures those values into a compact
snapshot and publishes the snapshot into an Orbit ring.

Collectors walk the ring backwards and keep the newest decodable sample
per node or per logical metric key.

```text
local counters/gauges
  -> compact snapshot
  -> OrbitMetricPublisher<T>
  -> orbit-rs ring for T::KIND
  -> OrbitMetricCollector<T>
  -> latest_by_node() / latest_by_key()
```

This keeps the hot path local and cheap. Orbit is only touched by the
publisher task that captures a bounded snapshot.

## Core Types

```text
OrbitMetricSnapshot
  Trait implemented by a compact metrics record.

OrbitMetricKeyedSnapshot
  Optional trait for row-like metric families with a logical key.

OrbitMetricFamily<T>
  Typed metrics family bound to an orbit-rs Fleet.

OrbitMetricPublisher<T>
  Writes encoded snapshots into the ring.

OrbitMetricCollector<T>
  Reads newest valid samples from the ring.

OrbitMetricSample<T>
  Decoded snapshot plus the NetId64 that carried it.
```

## Semantics

- newest valid sample per node wins;
- newest valid sample per metric key wins for keyed families;
- malformed old frames are ignored by collectors;
- stale samples can be filtered by timestamp;
- aggregation is left to the caller.

This crate does not sum counters, render Prometheus output, choose a
serialization format, supervise workers, or define application policy.

## Aggregation Pattern

Use one metric family per transport shape.

Fixed-width scalar families can usually be encoded as one compact
snapshot per worker. Dynamic or labeled rows should use a keyed metric
family so collectors can keep the newest row per key.

```text
multi-worker mode:
  worker local state
    -> OrbitMetricPublisher<T>
    -> aggregator collector
    -> merged output

standalone mode:
  local state
    -> local output
```

The point is not to make metrics durable. The point is to expose current
runtime state without requiring external systems to scrape every worker
process.

## Example Shape

```rust
use orbit_metrics::OrbitTyped;
use orbit_metrics::OrbitMetricSnapshot;

struct WorkerSnapshot {
    node: u16,
    captured_at: u64,
    requests: u64,
}

impl OrbitTyped for WorkerSnapshot {
    const KIND: u8 = 42;
}

impl OrbitMetricSnapshot for WorkerSnapshot {
    const FAMILY: &'static str = "worker";

    fn node_id(&self) -> u16 {
        self.node
    }

    fn captured_at_unix_secs(&self) -> u64 {
        self.captured_at
    }

    fn encode(&self) -> Result<Vec<u8>, String> {
        let mut out = Vec::with_capacity(18);
        out.extend_from_slice(&self.node.to_le_bytes());
        out.extend_from_slice(&self.captured_at.to_le_bytes());
        out.extend_from_slice(&self.requests.to_le_bytes());
        Ok(out)
    }

    fn decode(bytes: &[u8]) -> Result<Self, String> {
        if bytes.len() != 18 {
            return Err("invalid worker snapshot length".into());
        }

        Ok(Self {
            node: u16::from_le_bytes(bytes[0..2].try_into().unwrap()),
            captured_at: u64::from_le_bytes(bytes[2..10].try_into().unwrap()),
            requests: u64::from_le_bytes(bytes[10..18].try_into().unwrap()),
        })
    }
}
```

## Use Cases

- worker health snapshots;
- runtime gauges;
- aggregation inputs where stale data should be ignored;
- dashboards that need the latest row per node or key;
- compact operational metrics shared across sibling processes.