pub struct MetricsSnapshot {Show 14 fields
pub active_sessions: usize,
pub total_sessions: u64,
pub total_steps: u64,
pub total_tool_calls: u64,
pub failed_tool_calls: u64,
pub backpressure_shed_count: u64,
pub memory_recall_count: u64,
pub checkpoint_errors: u64,
pub per_tool_calls: HashMap<String, u64>,
pub per_tool_failures: HashMap<String, u64>,
pub step_latency_buckets: Vec<(u64, u64)>,
pub step_latency_mean_ms: f64,
pub per_agent_tool_calls: HashMap<String, HashMap<String, u64>>,
pub per_agent_tool_failures: HashMap<String, HashMap<String, u64>>,
}Expand description
A point-in-time snapshot of all runtime counters.
Obtained by calling RuntimeMetrics::snapshot. All fields are plain
integers so the snapshot can be logged, serialised, or diffed without
holding any locks.
See also snapshot for a richer snapshot including per-tool and histogram data.
§Example
use llm_agent_runtime::metrics::RuntimeMetrics;
let m = RuntimeMetrics::new();
let snap = m.snapshot();
assert_eq!(snap.active_sessions, 0);
assert_eq!(snap.total_sessions, 0);Fields§
§active_sessions: usizeNumber of agent sessions currently in progress.
total_sessions: u64Total number of sessions started since the runtime was created.
total_steps: u64Total number of ReAct steps executed across all sessions.
total_tool_calls: u64Total number of tool calls dispatched (across all tool names).
failed_tool_calls: u64Total number of tool calls that returned an error observation.
backpressure_shed_count: u64Total number of requests shed due to backpressure.
memory_recall_count: u64Total number of memory recall operations.
checkpoint_errors: u64Total number of checkpoint failures encountered during run_agent.
per_tool_calls: HashMap<String, u64>Per-tool call counts: tool_name → total_calls.
per_tool_failures: HashMap<String, u64>Per-tool failure counts: tool_name → failed_calls.
step_latency_buckets: Vec<(u64, u64)>Step latency histogram bucket counts as (upper_bound_ms_inclusive, count).
step_latency_mean_ms: f64Mean step latency in milliseconds.
per_agent_tool_calls: HashMap<String, HashMap<String, u64>>Per-agent, per-tool call counts: agent_id → tool_name → count.
per_agent_tool_failures: HashMap<String, HashMap<String, u64>>Per-agent, per-tool failure counts: agent_id → tool_name → count.
Implementations§
Source§impl MetricsSnapshot
impl MetricsSnapshot
Sourcepub fn delta(after: &Self, before: &Self) -> Self
pub fn delta(after: &Self, before: &Self) -> Self
Compute the difference between after and before (i.e., after - before).
Useful for per-request instrumentation:
let before = metrics.snapshot();
// ... run one agent invocation ...
let after = metrics.snapshot();
let delta = MetricsSnapshot::delta(&after, &before);
println!("steps this run: {}", delta.total_steps);Saturating subtraction is used so callers don’t need to guard against races where a counter is read before the full increment propagates.
Sourcepub fn to_json(&self) -> Value
pub fn to_json(&self) -> Value
Serialize the snapshot to a serde_json::Value for logging or export.
Sourcepub fn tool_call_count(&self, name: &str) -> u64
pub fn tool_call_count(&self, name: &str) -> u64
Return the number of calls recorded for the named tool.
Returns 0 if no calls have been recorded for that tool name.
Sourcepub fn summary_line(&self) -> String
pub fn summary_line(&self) -> String
Return a concise single-line summary of this snapshot.
Format: "sessions={n}, steps={n}, tool_calls={n}, failures={n}, latency_mean={n}ms".
Intended for logging and debugging — not a stable serialization format.
Sourcepub fn tool_failure_count(&self, name: &str) -> u64
pub fn tool_failure_count(&self, name: &str) -> u64
Return the number of failures recorded for the named tool.
Returns 0 if no failures have been recorded for that tool name.
Sourcepub fn tool_names(&self) -> Vec<&str>
pub fn tool_names(&self) -> Vec<&str>
Return a sorted list of tool names that have at least one recorded call.
Sourcepub fn failure_rate(&self) -> f64
pub fn failure_rate(&self) -> f64
Return the overall tool-call failure rate as a value in [0.0, 1.0].
Returns 0.0 if no tool calls have been recorded.
Sourcepub fn success_rate(&self) -> f64
pub fn success_rate(&self) -> f64
Return the overall tool-call success rate as a value in [0.0, 1.0].
Returns 1.0 if no tool calls have been recorded (vacuously all succeeded).
Sourcepub fn tool_success_count(&self, name: &str) -> u64
pub fn tool_success_count(&self, name: &str) -> u64
Return the number of successful calls for the named tool.
Computed as tool_call_count(name) - tool_failure_count(name).
Sourcepub fn tool_failure_rate(&self, name: &str) -> f64
pub fn tool_failure_rate(&self, name: &str) -> f64
Return the per-tool failure rate for the named tool.
Returns 0.0 if no calls have been recorded for that tool.
Sourcepub fn total_successful_tool_calls(&self) -> u64
pub fn total_successful_tool_calls(&self) -> u64
Return the total number of successful tool calls (total minus failed).
Uses saturating subtraction so a race between total_tool_calls
and failed_tool_calls cannot produce an underflow.
Sourcepub fn is_zero(&self) -> bool
pub fn is_zero(&self) -> bool
Return true if all counters are zero (no activity has been recorded).
Sourcepub fn avg_steps_per_session(&self) -> f64
pub fn avg_steps_per_session(&self) -> f64
Return the average number of ReAct steps per completed session.
Returns 0.0 when no sessions have been recorded, to avoid
division by zero.
Sourcepub fn error_rate(&self) -> f64
pub fn error_rate(&self) -> f64
Return the overall tool error rate: failed_tool_calls / total_tool_calls.
Returns 0.0 when no tool calls have been recorded.
Sourcepub fn memory_recall_rate(&self) -> f64
pub fn memory_recall_rate(&self) -> f64
Return memory recalls per completed session.
Returns 0.0 when no sessions have been recorded.
Sourcepub fn steps_per_session(&self) -> f64
pub fn steps_per_session(&self) -> f64
Return the average number of ReAct steps per session.
Alias for avg_steps_per_session on the snapshot type; returns 0.0
when no sessions have been recorded.
Sourcepub fn has_errors(&self) -> bool
pub fn has_errors(&self) -> bool
Return true if the snapshot contains any error indicators.
Specifically, true when failed_tool_calls > 0 or
checkpoint_errors > 0. The complement of “no errors” but distinct
from !is_healthy() which also considers backpressure sheds.
Sourcepub fn is_healthy(&self) -> bool
pub fn is_healthy(&self) -> bool
Return true if the snapshot shows no error indicators.
A “healthy” snapshot has zero failed tool calls, zero backpressure sheds, and zero checkpoint errors. Useful for quick health checks in tests and monitoring.
Sourcepub fn is_healthy_with_latency(&self, max_latency_ms: f64) -> bool
pub fn is_healthy_with_latency(&self, max_latency_ms: f64) -> bool
Return true if this snapshot passes a parameterised health check.
The check passes when all of the following hold:
failed_tool_calls == 0backpressure_shed_count == 0checkpoint_errors == 0step_latency_mean_ms <= max_latency_ms
Use this variant instead of is_healthy when you need to enforce an
explicit latency SLO — for example in an alerting callback.
Sourcepub fn is_empty(&self) -> bool
pub fn is_empty(&self) -> bool
Return true if no tool calls have been recorded yet.
A fresh snapshot (e.g. right after construction or after RuntimeMetrics::reset)
has all counters at zero. This predicate makes that condition explicit at call sites.
Sourcepub fn is_degraded(&self, threshold: f64) -> bool
pub fn is_degraded(&self, threshold: f64) -> bool
Return true if the tool failure rate exceeds threshold.
threshold should be in [0.0, 1.0] (e.g. 0.1 for 10%). Returns
false when no tool calls have been recorded (failure_rate is 0.0 in
that case).
This is a softer signal than is_healthy, which only checks for zero
failures. Use is_degraded in alerting logic that needs a configurable
SLO threshold.
Sourcepub fn tool_call_rate(&self) -> f64
pub fn tool_call_rate(&self) -> f64
Return the average number of tool calls per session.
Returns 0.0 when no sessions have been recorded.
Sourcepub fn backpressure_rate(&self) -> f64
pub fn backpressure_rate(&self) -> f64
Return the average number of backpressure shed events per session.
Returns 0.0 when no sessions have been recorded.
Sourcepub fn memory_efficiency(&self) -> f64
pub fn memory_efficiency(&self) -> f64
Return the ratio of memory recalls to total steps.
Returns 0.0 when no steps have been taken.
Sourcepub fn active_session_ratio(&self) -> f64
pub fn active_session_ratio(&self) -> f64
Return the fraction of sessions that are currently active.
Returns 0.0 when no sessions have been started.
Sourcepub fn step_to_tool_ratio(&self) -> f64
pub fn step_to_tool_ratio(&self) -> f64
Return the average number of tool calls per step.
Returns 0.0 when no steps have been taken.
Sourcepub fn has_failures(&self) -> bool
pub fn has_failures(&self) -> bool
Return true if any tool-call failures have been recorded.
Sourcepub fn tool_diversity(&self) -> usize
pub fn tool_diversity(&self) -> usize
Return the number of distinct tool names that have been called at least once.
Sourcepub fn avg_failures_per_session(&self) -> f64
pub fn avg_failures_per_session(&self) -> f64
Return the average number of tool-call failures per completed session.
Returns 0.0 when no sessions have been recorded.
Sourcepub fn most_called_tool(&self) -> Option<String>
pub fn most_called_tool(&self) -> Option<String>
Return the name of the tool with the most recorded calls.
Returns None if no tool calls have been recorded.
Sourcepub fn tool_names_with_failures(&self) -> Vec<String>
pub fn tool_names_with_failures(&self) -> Vec<String>
Return a sorted list of tool names that have at least one recorded failure.
Sourcepub fn has_any_tool_failures(&self) -> bool
pub fn has_any_tool_failures(&self) -> bool
Return true if at least one tool has a recorded failure.
Sourcepub fn tools_with_zero_failures(&self) -> Vec<String>
pub fn tools_with_zero_failures(&self) -> Vec<String>
Return sorted names of all tracked tools that have zero recorded failures.
A tool that has never been called is included if it appears in the
per_tool_calls map with a count of zero.
Sourcepub fn total_tool_calls_count(&self) -> u64
pub fn total_tool_calls_count(&self) -> u64
Return the sum of call counts across all tracked tools.
This is the per-tool sum, which may differ from total_tool_calls if
the snapshot was produced from multiple sources.
Sourcepub fn tool_call_imbalance(&self) -> f64
pub fn tool_call_imbalance(&self) -> f64
Return the ratio of the most-called tool’s count to the least-called tool’s count.
Returns 1.0 when fewer than two tools are tracked (no imbalance
measurable) or when the minimum is zero. A high ratio indicates that
load is concentrated on a single tool.
Sourcepub fn failed_tool_ratio_for(&self, name: &str) -> f64
pub fn failed_tool_ratio_for(&self, name: &str) -> f64
Return the failure rate for a specific tool (failures / calls).
Returns 0.0 if the tool has no recorded calls.
Sourcepub fn backpressure_shed_rate(&self) -> f64
pub fn backpressure_shed_rate(&self) -> f64
Return the ratio of backpressure-shed events to total tool calls.
Returns 0.0 if no tool calls have been recorded.
Sourcepub fn total_agent_count(&self) -> usize
pub fn total_agent_count(&self) -> usize
Return the number of distinct agents that have recorded tool-call data.
Sourcepub fn steps_per_tool_call(&self) -> f64
pub fn steps_per_tool_call(&self) -> f64
Return the ratio of total steps to total tool calls.
Returns 0.0 if no tool calls have been recorded.
Sourcepub fn agent_with_most_calls(&self) -> Option<String>
pub fn agent_with_most_calls(&self) -> Option<String>
Return the agent id with the most total tool calls across all tools.
Returns None if no per-agent tool-call data has been recorded.
Sourcepub fn total_tool_failures(&self) -> u64
pub fn total_tool_failures(&self) -> u64
Return the total number of tool failures summed across all tools.
This is the sum of per_tool_failures values and equals
failed_tool_calls when per-tool tracking is complete. Useful for
verifying that failure tracking is consistent with overall counters.
Sourcepub fn least_called_tool(&self) -> Option<String>
pub fn least_called_tool(&self) -> Option<String>
Return the name of the tool with the fewest recorded calls.
Returns None if no tool-call data has been recorded. When multiple
tools share the minimum call count, any one of them may be returned.
Sourcepub fn avg_tool_calls_per_name(&self) -> f64
pub fn avg_tool_calls_per_name(&self) -> f64
Return the mean number of calls per distinct tool name.
Returns 0.0 when no tool-call data has been recorded.
Sourcepub fn tool_call_count_above(&self, n: u64) -> usize
pub fn tool_call_count_above(&self, n: u64) -> usize
Return the number of distinct tool names that have more than n recorded calls.
Returns 0 when no tool-call data has been recorded.
Sourcepub fn top_n_tools_by_calls(&self, n: usize) -> Vec<(&str, u64)>
pub fn top_n_tools_by_calls(&self, n: usize) -> Vec<(&str, u64)>
Return the top n tool names sorted by call count (descending).
Returns fewer than n entries if fewer tools have been called.
Ties are broken alphabetically (ascending) for deterministic output.
Sourcepub fn tool_call_ratio(&self, name: &str) -> f64
pub fn tool_call_ratio(&self, name: &str) -> f64
Return the fraction of total tool calls accounted for by name.
Returns 0.0 if total_tool_calls is zero or name has no recorded
calls. Returns a value in [0.0, 1.0].
Sourcepub fn per_tool_calls_sorted(&self) -> Vec<(String, u64)>
pub fn per_tool_calls_sorted(&self) -> Vec<(String, u64)>
Return all per-tool call counts sorted by count descending.
Returns a Vec of (tool_name, count) pairs where the first entry is
the most-called tool. Returns an empty Vec when no calls have been
recorded. Ties are broken alphabetically (ascending).
Sourcepub fn has_tool(&self, name: &str) -> bool
pub fn has_tool(&self, name: &str) -> bool
Return true if name appears in the per-tool call map (i.e., was
called at least once), false otherwise.
Return the fraction of total tool calls attributable to name.
Returns 0.0 when total_tool_calls is zero or when name has no
recorded calls. The result is in [0.0, 1.0].
Sourcepub fn distinct_tool_count(&self) -> usize
pub fn distinct_tool_count(&self) -> usize
Return the number of distinct tool names that have at least one recorded call.
Returns 0 when no tool calls have been recorded.
Sourcepub fn has_any_tool_calls(&self) -> bool
pub fn has_any_tool_calls(&self) -> bool
Return true if at least one tool call has been recorded.
Equivalent to self.total_tool_calls > 0, provided as a convenience
predicate for guard clauses.
Sourcepub fn tool_names_alphabetical(&self) -> Vec<String>
pub fn tool_names_alphabetical(&self) -> Vec<String>
Return tool names sorted alphabetically.
Only names that appear in the per_tool_calls map are included.
Returns an empty Vec when no tool calls have been recorded.
Sourcepub fn avg_failures_per_tool(&self) -> f64
pub fn avg_failures_per_tool(&self) -> f64
Return the average number of failures per distinct tool.
Computed as total recorded failures divided by the number of distinct
tool names in per_tool_calls. Returns 0.0 when no tool calls have
been recorded.
Sourcepub fn tools_above_failure_ratio(&self, threshold: f64) -> Vec<String>
pub fn tools_above_failure_ratio(&self, threshold: f64) -> Vec<String>
Return the names of tools whose failure ratio (failures / calls) exceeds
threshold, sorted alphabetically.
Returns an empty Vec when no tool exceeds the threshold or when no
tool calls have been recorded.
Sourcepub fn failure_ratio_for_tool(&self, name: &str) -> f64
pub fn failure_ratio_for_tool(&self, name: &str) -> f64
Return the failure ratio for a specific tool: failures / calls.
Returns 0.0 if the tool has never been called or is unknown, avoiding
division-by-zero. A ratio of 1.0 means every invocation failed.
Sourcepub fn any_tool_exceeds_calls(&self, threshold: u64) -> bool
pub fn any_tool_exceeds_calls(&self, threshold: u64) -> bool
Return true if any registered tool has a call count strictly above
threshold.
Useful for detecting hotspot tools that may be responsible for disproportionate load.
Sourcepub fn total_unique_tools(&self) -> usize
pub fn total_unique_tools(&self) -> usize
Return the number of distinct tools that have been tracked in this snapshot (i.e. tools with at least one call recorded).
Equivalent to per_tool_calls.len() but exposed as a named method for
readability.
Sourcepub fn tool_call_ratio_for(&self, name: &str) -> f64
pub fn tool_call_ratio_for(&self, name: &str) -> f64
Return the fraction of all tool calls that were made by the named tool.
Returns 0.0 when the tool is unknown or there have been no tool calls
at all. A value of 1.0 means this tool accounts for every call.
Sourcepub fn total_failures_across_all_tools(&self) -> u64
pub fn total_failures_across_all_tools(&self) -> u64
Return the sum of all per-tool failure counts across every tracked tool.
This is the total number of error observations emitted by tool handlers,
regardless of which tool generated them. Returns 0 when no failures
have been recorded.
Trait Implementations§
Source§impl Clone for MetricsSnapshot
impl Clone for MetricsSnapshot
Source§fn clone(&self) -> MetricsSnapshot
fn clone(&self) -> MetricsSnapshot
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more