Struct Metrics

Source

pub struct Metrics {Show 28 fields
    pub registry: Registry,
    pub store_total: IntCounterVec,
    pub recall_total: IntCounterVec,
    pub recall_latency_seconds: HistogramVec,
    pub autonomy_hook_total: IntCounterVec,
    pub contradiction_detected_total: IntCounter,
    pub webhook_dispatched_total: IntCounter,
    pub webhook_failed_total: IntCounter,
    pub memories_gauge: IntGauge,
    pub hnsw_size_gauge: IntGauge,
    pub subscriptions_active_gauge: IntGauge,
    pub curator_cycles_total: IntCounter,
    pub curator_operations_total: IntCounterVec,
    pub curator_cycle_duration_seconds: HistogramVec,
    pub federation_fanout_dropped_total: IntCounterVec,
    pub federation_fanout_retry_total: IntCounterVec,
    pub federation_partial_quorum_total: IntCounter,
    pub corrupt_provenance_rows_total: IntCounterVec,
    pub auto_export_spawn_failed_total: IntCounter,
    pub federation_push_dlq_depth: IntGauge,
    pub federation_push_dlq_quarantined: IntCounter,
    pub hnsw_evictions_total: IntCounter,
    pub hnsw_last_eviction_at_nanos: IntGauge,
    pub subscription_dlq_overflow_total: IntCounter,
    pub federation_cred_verify_total: IntCounterVec,
    pub federation_inbound_cred_total: IntCounterVec,
    pub federation_cred_max_age_seconds: IntGauge,
    pub federation_renewal_lag_seconds: IntGauge,
}

Expand description

Handles to the registered metric families. Built once on first access via registry().

Fields are public so call sites in handlers.rs, future subscriptions.rs, and the test module can .inc() / .observe() / .set() directly. #[allow(dead_code)] covers the handles that aren’t wired to a caller yet — they surface in /metrics output (see the render_includes_registered_names test) and will be instrumented as sibling features land (hnsw gauge via the HNSW module, subscriptions gauge via the webhook PR, webhook counters via the dispatch path, etc.).

Fields§

§registry: Registry§store_total: IntCounterVec§recall_total: IntCounterVec§recall_latency_seconds: HistogramVec§autonomy_hook_total: IntCounterVec§contradiction_detected_total: IntCounter§webhook_dispatched_total: IntCounter§webhook_failed_total: IntCounter§memories_gauge: IntGauge§hnsw_size_gauge: IntGauge§subscriptions_active_gauge: IntGauge§curator_cycles_total: IntCounter§curator_operations_total: IntCounterVec§curator_cycle_duration_seconds: HistogramVec§federation_fanout_dropped_total: IntCounterVec

Ultrareview #343: count of post-quorum fanout tasks whose outcome could not be observed (shutdown, panic, or the spawned task erred). Non-zero indicates mesh divergence risk.

§federation_fanout_retry_total: IntCounterVec

S40 (v0.6.2 Patch 2): count of peer POST retries, labeled by final outcome. ok = retry recovered the row; fail = both attempts failed (peer likely truly down); id_drift = retry observed the same peer id-drift as attempt 1.

§federation_partial_quorum_total: IntCounter

H9 (v0.7.0 round-2): count of quorum writes that the leader returned 200 for (W met) but where at least one configured peer did NOT ack inside the deadline. Operators alert on non-zero rate to detect mesh-divergence drift early — before a follow-up catchup sync surfaces the gap.

§corrupt_provenance_rows_total: IntCounterVec

Cluster-A COR-3 (v0.7.0): count of memory rows whose Form 4 fact-provenance JSON columns (citations, source_span, confidence_signals, or pre-Form-4 metadata) failed to parse and were silently defaulted by row_to_memory. Non-zero indicates schema drift, writer-side corruption, or a migration that left malformed JSON in the column. Labeled by column name (citations | source_span | confidence_signals | metadata).

§auto_export_spawn_failed_total: IntCounter

v0.7-polish SEC-15 / COR-11 (issue #780): count of post_reflect.auto_export detached worker invocations whose outcome was a panic or a returned Err. Non-zero means an operator-opted-in namespace had a reflection that did NOT land on the filesystem and the failure would otherwise be silent (the worker thread is detached; the reflection itself already committed). The capabilities-v3 surface mirrors this counter so operator dashboards can alert without scraping /metrics directly.

§federation_push_dlq_depth: IntGauge

v0.7.0 Track D #933 — current depth of the federation push DLQ (federation_push_dlq table, WHERE replayed_at IS NULL). Refreshed on every tick of the replay_federation_push_dlq worker spawned alongside the catchup loop. Operators alert on non-zero sustained depth — a healthy mesh should drain back to 0 within one replay interval after the peer recovers.

§federation_push_dlq_quarantined: IntCounter

#1032 (HIGH, 2026-05-21) — monotonic counter for DLQ rows the replay worker has marked as quarantined (attempt_count >= MAX_REPLAY_ATTEMPTS). Pre-#1032 the replay loop retried poison messages forever; now rows past the ceiling are skipped + this counter increments per quarantined row per tick (the row stays in the DLQ until an operator drains it via ai-memory federation dlq drain --quarantined). Operators alert on non-zero increment rate — a healthy mesh should have zero rows reaching the quarantine threshold.

§hnsw_evictions_total: IntCounter

pm-v3.1 PR8 (issue #1174) — cumulative HNSW oldest-eviction count since process start. Replaces the prior process-global AtomicU64 INDEX_EVICTIONS_TOTAL in src/hnsw.rs. Non-zero means the in-memory vector index has hit MAX_ENTRIES and dropped older embeddings; recall quality may have degraded for evicted ids until they are re-inserted (e.g. on next access via the recall touch path). Surfaces in memory_capabilities (hnsw.evictions_total), /metrics (ai_memory_hnsw_evictions_total), and memory_stats.

§hnsw_last_eviction_at_nanos: IntGauge

pm-v3.1 PR8 (issue #1174) — wall-clock UNIX nanoseconds of the most recent HNSW eviction (0 if none have occurred). Replaces the prior process-global AtomicU64 LAST_EVICTION_AT_NANOS in src/hnsw.rs. Capabilities derives hnsw.evicted_recently from this with a 60s rolling window. Surfaced as an IntGauge so the value is also readable via Prometheus scraping.

§subscription_dlq_overflow_total: IntCounter

#1253 (MED, 2026-05-25) — monotonic counter for subscription DLQ insert attempts that were refused because the per- subscription DLQ depth had already hit crate::subscriptions::MAX_SUBSCRIPTION_DLQ_ROWS. Non-zero means a hostile (or simply-broken) webhook target is failing every delivery and would otherwise fill the operator’s disk with quarantined rows. Each refused insert pairs with a tracing::warn! so operators see the subscription id + correlation id of the dropped row.

§federation_cred_verify_total: IntCounterVec

FED-P4-e (federation-identity-at-scale §8) — federation credential-verification outcomes on the receiver path, labeled result (ok | fail). The verify-failure-rate SLO is fail / (ok + fail). A non-zero sustained fail rate means peers are presenting credentials the local trust bundle cannot verify — an expired leaf, a revoked issuer, a clock-skew window, or a chain that fails to anchor. Healthy meshes hold this at 0 once every peer’s issuer key is enrolled in the bundle.

§federation_inbound_cred_total: IntCounterVec

FED-P4-e (federation-identity-at-scale §8) — inbound federation requests bucketed by whether they presented a signed credential at all, labeled presence (signed | unsigned). The signed-vs-unsigned-ratio SLO is signed / (signed + unsigned). During a rollout this climbs from 0 toward 1 as peers upgrade to credential-presenting builds; operators gate the flip of AI_MEMORY_FED_REQUIRE_PEER_ENROLLMENT to the secure default on this ratio reaching 1.0 across the fleet.

§federation_cred_max_age_seconds: IntGauge

FED-P4-e (federation-identity-at-scale §8) — age in seconds of the local outbound leaf credential (now − issued_at), refreshed on every renewal tick. The max-cred-age SLO alerts when this approaches the leaf TTL (crate::federation::identity::issuer::DEFAULT_CREDENTIAL_TTL_SECS) — a credential that ages past its TTL without a renewal means the refresh worker has stalled and outbound sync will start failing peer verification.

§federation_renewal_lag_seconds: IntGauge

FED-P4-e (federation-identity-at-scale §8) — seconds since the last successful outbound-credential renewal (now − last-renew wall clock), refreshed on every renewal tick. The renewal-lag SLO alerts when this exceeds the configured refresh interval by a safety margin: a healthy worker re-renews well inside the leaf TTL, so a lag larger than the interval means renewals are silently failing (bad CA reachability, key-load fault) even though the worker thread is still alive.

Auto Trait Implementations§

§

Metrics

Struct Metrics Copy item path

Fields§

Auto Trait Implementations§

impl !RefUnwindSafe for Metrics

impl !UnwindSafe for Metrics

impl Freeze for Metrics

impl Send for Metrics

impl Sync for Metrics

impl Unpin for Metrics

impl UnsafeUnpin for Metrics

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> ErasedDestructor for Twhere T: 'static,

impl<T> From<T> for T

fn from(t: T) -> T

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

fn in_current_span(self) -> Instrumented<Self>

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> Pointable for T

const ALIGN: usize

type Init = T

unsafe fn init(init: <T as Pointable>::Init) -> usize

unsafe fn deref<'a>(ptr: usize) -> &'a T

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

unsafe fn drop(ptr: usize)

impl<T> PolicyExt for Twhere T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>where T: Sized + Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>where T: Sized + Policy<B, E>, P: Policy<B, E>,

impl<T> Same for T

type Output = T

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<V, T> VZip<V> for Twhere V: MultiLane<T>,

fn vzip(self) -> V

impl<T> WithSubscriber for T

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>where S: Into<Dispatch>,

fn with_current_subscriber(self) -> WithDispatch<Self>

Struct Metrics

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> ErasedDestructor for T
where T: 'static,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T> PolicyExt for T
where T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Sized + Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Sized + Policy<B, E>, P: Policy<B, E>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,