pub struct Metrics {Show 28 fields
pub registry: Registry,
pub store_total: IntCounterVec,
pub recall_total: IntCounterVec,
pub recall_latency_seconds: HistogramVec,
pub autonomy_hook_total: IntCounterVec,
pub contradiction_detected_total: IntCounter,
pub webhook_dispatched_total: IntCounter,
pub webhook_failed_total: IntCounter,
pub memories_gauge: IntGauge,
pub hnsw_size_gauge: IntGauge,
pub subscriptions_active_gauge: IntGauge,
pub curator_cycles_total: IntCounter,
pub curator_operations_total: IntCounterVec,
pub curator_cycle_duration_seconds: HistogramVec,
pub federation_fanout_dropped_total: IntCounterVec,
pub federation_fanout_retry_total: IntCounterVec,
pub federation_partial_quorum_total: IntCounter,
pub corrupt_provenance_rows_total: IntCounterVec,
pub auto_export_spawn_failed_total: IntCounter,
pub federation_push_dlq_depth: IntGauge,
pub federation_push_dlq_quarantined: IntCounter,
pub hnsw_evictions_total: IntCounter,
pub hnsw_last_eviction_at_nanos: IntGauge,
pub subscription_dlq_overflow_total: IntCounter,
pub federation_cred_verify_total: IntCounterVec,
pub federation_inbound_cred_total: IntCounterVec,
pub federation_cred_max_age_seconds: IntGauge,
pub federation_renewal_lag_seconds: IntGauge,
}Expand description
Handles to the registered metric families. Built once on first access
via registry().
Fields are public so call sites in handlers.rs, future
subscriptions.rs, and the test module can .inc() / .observe() /
.set() directly. #[allow(dead_code)] covers the handles that
aren’t wired to a caller yet — they surface in /metrics output
(see the render_includes_registered_names test) and will be
instrumented as sibling features land (hnsw gauge via the HNSW
module, subscriptions gauge via the webhook PR, webhook counters
via the dispatch path, etc.).
Fields§
§registry: Registry§store_total: IntCounterVec§recall_total: IntCounterVec§recall_latency_seconds: HistogramVec§autonomy_hook_total: IntCounterVec§contradiction_detected_total: IntCounter§webhook_dispatched_total: IntCounter§webhook_failed_total: IntCounter§memories_gauge: IntGauge§hnsw_size_gauge: IntGauge§subscriptions_active_gauge: IntGauge§curator_cycles_total: IntCounter§curator_operations_total: IntCounterVec§curator_cycle_duration_seconds: HistogramVec§federation_fanout_dropped_total: IntCounterVecUltrareview #343: count of post-quorum fanout tasks whose outcome could not be observed (shutdown, panic, or the spawned task erred). Non-zero indicates mesh divergence risk.
federation_fanout_retry_total: IntCounterVecS40 (v0.6.2 Patch 2): count of peer POST retries, labeled by
final outcome. ok = retry recovered the row; fail = both
attempts failed (peer likely truly down); id_drift = retry
observed the same peer id-drift as attempt 1.
federation_partial_quorum_total: IntCounterH9 (v0.7.0 round-2): count of quorum writes that the leader
returned 200 for (W met) but where at least one configured
peer did NOT ack inside the deadline. Operators alert on
non-zero rate to detect mesh-divergence drift early — before a
follow-up catchup sync surfaces the gap.
corrupt_provenance_rows_total: IntCounterVecCluster-A COR-3 (v0.7.0): count of memory rows whose Form 4
fact-provenance JSON columns (citations, source_span,
confidence_signals, or pre-Form-4 metadata) failed to parse
and were silently defaulted by row_to_memory. Non-zero
indicates schema drift, writer-side corruption, or a
migration that left malformed JSON in the column. Labeled by
column name (citations | source_span | confidence_signals
| metadata).
auto_export_spawn_failed_total: IntCounterv0.7-polish SEC-15 / COR-11 (issue #780): count of
post_reflect.auto_export detached worker invocations whose
outcome was a panic or a returned Err. Non-zero means an
operator-opted-in namespace had a reflection that did NOT
land on the filesystem and the failure would otherwise be
silent (the worker thread is detached; the reflection itself
already committed). The capabilities-v3 surface mirrors this
counter so operator dashboards can alert without scraping
/metrics directly.
federation_push_dlq_depth: IntGaugev0.7.0 Track D #933 — current depth of the federation push
DLQ (federation_push_dlq table, WHERE replayed_at IS NULL).
Refreshed on every tick of the replay_federation_push_dlq
worker spawned alongside the catchup loop. Operators alert on
non-zero sustained depth — a healthy mesh should drain back
to 0 within one replay interval after the peer recovers.
federation_push_dlq_quarantined: IntCounter#1032 (HIGH, 2026-05-21) — monotonic counter for DLQ rows the
replay worker has marked as quarantined (attempt_count >= MAX_REPLAY_ATTEMPTS). Pre-#1032 the replay loop retried
poison messages forever; now rows past the ceiling are
skipped + this counter increments per quarantined row per
tick (the row stays in the DLQ until an operator drains it
via ai-memory federation dlq drain --quarantined). Operators
alert on non-zero increment rate — a healthy mesh should have
zero rows reaching the quarantine threshold.
hnsw_evictions_total: IntCounterpm-v3.1 PR8 (issue #1174) — cumulative HNSW oldest-eviction
count since process start. Replaces the prior process-global
AtomicU64 INDEX_EVICTIONS_TOTAL in src/hnsw.rs.
Non-zero means the in-memory vector index has hit
MAX_ENTRIES and dropped older embeddings; recall quality
may have degraded for evicted ids until they are re-inserted
(e.g. on next access via the recall touch path). Surfaces in
memory_capabilities (hnsw.evictions_total), /metrics
(ai_memory_hnsw_evictions_total), and memory_stats.
hnsw_last_eviction_at_nanos: IntGaugepm-v3.1 PR8 (issue #1174) — wall-clock UNIX nanoseconds of the
most recent HNSW eviction (0 if none have occurred). Replaces
the prior process-global AtomicU64 LAST_EVICTION_AT_NANOS
in src/hnsw.rs. Capabilities derives hnsw.evicted_recently
from this with a 60s rolling window. Surfaced as an IntGauge
so the value is also readable via Prometheus scraping.
subscription_dlq_overflow_total: IntCounter#1253 (MED, 2026-05-25) — monotonic counter for subscription
DLQ insert attempts that were refused because the per-
subscription DLQ depth had already hit
crate::subscriptions::MAX_SUBSCRIPTION_DLQ_ROWS. Non-zero
means a hostile (or simply-broken) webhook target is failing
every delivery and would otherwise fill the operator’s disk
with quarantined rows. Each refused insert pairs with a
tracing::warn! so operators see the subscription id + correlation
id of the dropped row.
federation_cred_verify_total: IntCounterVecFED-P4-e (federation-identity-at-scale §8) — federation
credential-verification outcomes on the receiver path, labeled
result (ok | fail). The verify-failure-rate SLO is
fail / (ok + fail). A non-zero sustained fail rate means peers
are presenting credentials the local trust bundle cannot verify
— an expired leaf, a revoked issuer, a clock-skew window, or a
chain that fails to anchor. Healthy meshes hold this at 0 once
every peer’s issuer key is enrolled in the bundle.
federation_inbound_cred_total: IntCounterVecFED-P4-e (federation-identity-at-scale §8) — inbound federation
requests bucketed by whether they presented a signed credential
at all, labeled presence (signed | unsigned). The
signed-vs-unsigned-ratio SLO is signed / (signed + unsigned).
During a rollout this climbs from 0 toward 1 as peers upgrade to
credential-presenting builds; operators gate the flip of
AI_MEMORY_FED_REQUIRE_PEER_ENROLLMENT to the secure default on
this ratio reaching 1.0 across the fleet.
federation_cred_max_age_seconds: IntGaugeFED-P4-e (federation-identity-at-scale §8) — age in seconds of
the local outbound leaf credential (now − issued_at),
refreshed on every renewal tick. The max-cred-age SLO alerts
when this approaches the leaf TTL
(crate::federation::identity::issuer::DEFAULT_CREDENTIAL_TTL_SECS)
— a credential that ages past its TTL without a renewal means
the refresh worker has stalled and outbound sync will start
failing peer verification.
federation_renewal_lag_seconds: IntGaugeFED-P4-e (federation-identity-at-scale §8) — seconds since the last successful outbound-credential renewal (now − last-renew wall clock), refreshed on every renewal tick. The renewal-lag SLO alerts when this exceeds the configured refresh interval by a safety margin: a healthy worker re-renews well inside the leaf TTL, so a lag larger than the interval means renewals are silently failing (bad CA reachability, key-load fault) even though the worker thread is still alive.
Auto Trait Implementations§
impl !RefUnwindSafe for Metrics
impl !UnwindSafe for Metrics
impl Freeze for Metrics
impl Send for Metrics
impl Sync for Metrics
impl Unpin for Metrics
impl UnsafeUnpin for Metrics
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<T> ErasedDestructor for Twhere
T: 'static,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more