Struct MetricsSnapshot

Source

pub struct MetricsSnapshot {Show 14 fields
    pub active_sessions: usize,
    pub total_sessions: u64,
    pub total_steps: u64,
    pub total_tool_calls: u64,
    pub failed_tool_calls: u64,
    pub backpressure_shed_count: u64,
    pub memory_recall_count: u64,
    pub checkpoint_errors: u64,
    pub per_tool_calls: HashMap<String, u64>,
    pub per_tool_failures: HashMap<String, u64>,
    pub step_latency_buckets: Vec<(u64, u64)>,
    pub step_latency_mean_ms: f64,
    pub per_agent_tool_calls: HashMap<String, HashMap<String, u64>>,
    pub per_agent_tool_failures: HashMap<String, HashMap<String, u64>>,
}

Expand description

A point-in-time snapshot of all runtime counters.

Obtained by calling RuntimeMetrics::snapshot. All fields are plain integers so the snapshot can be logged, serialised, or diffed without holding any locks.

See also snapshot for a richer snapshot including per-tool and histogram data.

§Example

use llm_agent_runtime::metrics::RuntimeMetrics;

let m = RuntimeMetrics::new();
let snap = m.snapshot();
assert_eq!(snap.active_sessions, 0);
assert_eq!(snap.total_sessions, 0);

Fields§

§active_sessions: usize

Number of agent sessions currently in progress.

§total_sessions: u64

Total number of sessions started since the runtime was created.

§total_steps: u64

Total number of ReAct steps executed across all sessions.

§total_tool_calls: u64

Total number of tool calls dispatched (across all tool names).

§failed_tool_calls: u64

Total number of tool calls that returned an error observation.

§backpressure_shed_count: u64

Total number of requests shed due to backpressure.

§memory_recall_count: u64

Total number of memory recall operations.

§checkpoint_errors: u64

Total number of checkpoint failures encountered during run_agent.

§per_tool_calls: HashMap<String, u64>

Per-tool call counts: tool_name → total_calls.

§per_tool_failures: HashMap<String, u64>

Per-tool failure counts: tool_name → failed_calls.

§step_latency_buckets: Vec<(u64, u64)>

Step latency histogram bucket counts as (upper_bound_ms_inclusive, count).

§step_latency_mean_ms: f64

Mean step latency in milliseconds.

§per_agent_tool_calls: HashMap<String, HashMap<String, u64>>

Per-agent, per-tool call counts: agent_id → tool_name → count.

§per_agent_tool_failures: HashMap<String, HashMap<String, u64>>

Per-agent, per-tool failure counts: agent_id → tool_name → count.

Implementations§

Source §

impl MetricsSnapshot

Source

pub fn delta(after: &Self, before: &Self) -> Self

Compute the difference between after and before (i.e., after - before).

Useful for per-request instrumentation:

let before = metrics.snapshot();
// ... run one agent invocation ...
let after = metrics.snapshot();
let delta = MetricsSnapshot::delta(&after, &before);
println!("steps this run: {}", delta.total_steps);

Saturating subtraction is used so callers don’t need to guard against races where a counter is read before the full increment propagates.

Source

pub fn to_json(&self) -> Value

Serialize the snapshot to a serde_json::Value for logging or export.

Source

pub fn tool_call_count(&self, name: &str) -> u64

Return the number of calls recorded for the named tool.

Returns 0 if no calls have been recorded for that tool name.

Source

pub fn summary_line(&self) -> String

Return a concise single-line summary of this snapshot.

Format: "sessions={n}, steps={n}, tool_calls={n}, failures={n}, latency_mean={n}ms". Intended for logging and debugging — not a stable serialization format.

Source

pub fn tool_failure_count(&self, name: &str) -> u64

Return the number of failures recorded for the named tool.

Returns 0 if no failures have been recorded for that tool name.

Source

pub fn tool_names(&self) -> Vec<&str>

Return a sorted list of tool names that have at least one recorded call.

Source

pub fn failure_rate(&self) -> f64

Return the overall tool-call failure rate as a value in [0.0, 1.0].

Returns 0.0 if no tool calls have been recorded.

Source

pub fn success_rate(&self) -> f64

Return the overall tool-call success rate as a value in [0.0, 1.0].

Returns 1.0 if no tool calls have been recorded (vacuously all succeeded).

Source

pub fn tool_success_count(&self, name: &str) -> u64

Return the number of successful calls for the named tool.

Computed as tool_call_count(name) - tool_failure_count(name).

Source

pub fn tool_failure_rate(&self, name: &str) -> f64

Return the per-tool failure rate for the named tool.

Returns 0.0 if no calls have been recorded for that tool.

Source

pub fn total_successful_tool_calls(&self) -> u64

Return the total number of successful tool calls (total minus failed).

Uses saturating subtraction so a race between total_tool_calls and failed_tool_calls cannot produce an underflow.

Source

pub fn is_zero(&self) -> bool

Return true if all counters are zero (no activity has been recorded).

Source

pub fn avg_steps_per_session(&self) -> f64

Return the average number of ReAct steps per completed session.

Returns 0.0 when no sessions have been recorded, to avoid division by zero.

Source

pub fn error_rate(&self) -> f64

Return the overall tool error rate: failed_tool_calls / total_tool_calls.

Returns 0.0 when no tool calls have been recorded.

Source

pub fn memory_recall_rate(&self) -> f64

Return memory recalls per completed session.

Returns 0.0 when no sessions have been recorded.

Source

pub fn steps_per_session(&self) -> f64

Return the average number of ReAct steps per session.

Alias for avg_steps_per_session on the snapshot type; returns 0.0 when no sessions have been recorded.

Source

pub fn has_errors(&self) -> bool

Return true if the snapshot contains any error indicators.

Specifically, true when failed_tool_calls > 0 or checkpoint_errors > 0. The complement of “no errors” but distinct from !is_healthy() which also considers backpressure sheds.

Source

pub fn is_healthy(&self) -> bool

Return true if the snapshot shows no error indicators.

A “healthy” snapshot has zero failed tool calls, zero backpressure sheds, and zero checkpoint errors. Useful for quick health checks in tests and monitoring.

Source

pub fn is_healthy_with_latency(&self, max_latency_ms: f64) -> bool

Return true if this snapshot passes a parameterised health check.

The check passes when all of the following hold:

failed_tool_calls == 0
backpressure_shed_count == 0
checkpoint_errors == 0
step_latency_mean_ms <= max_latency_ms

Use this variant instead of is_healthy when you need to enforce an explicit latency SLO — for example in an alerting callback.

Source

pub fn is_empty(&self) -> bool

Return true if no tool calls have been recorded yet.

A fresh snapshot (e.g. right after construction or after RuntimeMetrics::reset) has all counters at zero. This predicate makes that condition explicit at call sites.

Source

pub fn is_degraded(&self, threshold: f64) -> bool

Return true if the tool failure rate exceeds threshold.

threshold should be in [0.0, 1.0] (e.g. 0.1 for 10%). Returns false when no tool calls have been recorded (failure_rate is 0.0 in that case).

This is a softer signal than is_healthy, which only checks for zero failures. Use is_degraded in alerting logic that needs a configurable SLO threshold.

Source

pub fn tool_call_rate(&self) -> f64

Return the average number of tool calls per session.

Returns 0.0 when no sessions have been recorded.

Source

pub fn backpressure_rate(&self) -> f64

Return the average number of backpressure shed events per session.

Returns 0.0 when no sessions have been recorded.

Source

pub fn memory_efficiency(&self) -> f64

Return the ratio of memory recalls to total steps.

Returns 0.0 when no steps have been taken.

Source

pub fn active_session_ratio(&self) -> f64

Return the fraction of sessions that are currently active.

Returns 0.0 when no sessions have been started.

Source

pub fn step_to_tool_ratio(&self) -> f64

Return the average number of tool calls per step.

Returns 0.0 when no steps have been taken.

Source

pub fn has_failures(&self) -> bool

Return true if any tool-call failures have been recorded.

Source

pub fn tool_diversity(&self) -> usize

Return the number of distinct tool names that have been called at least once.

Source

pub fn avg_failures_per_session(&self) -> f64

Return the average number of tool-call failures per completed session.

Returns 0.0 when no sessions have been recorded.

Source

pub fn most_called_tool(&self) -> Option<String>

Return the name of the tool with the most recorded calls.

Returns None if no tool calls have been recorded.

Source

pub fn tool_names_with_failures(&self) -> Vec<String>

Return a sorted list of tool names that have at least one recorded failure.

Source

pub fn has_any_tool_failures(&self) -> bool

Return true if at least one tool has a recorded failure.

Source

pub fn tools_with_zero_failures(&self) -> Vec<String>

Return sorted names of all tracked tools that have zero recorded failures.

A tool that has never been called is included if it appears in the per_tool_calls map with a count of zero.

Source

pub fn total_tool_calls_count(&self) -> u64

Return the sum of call counts across all tracked tools.

This is the per-tool sum, which may differ from total_tool_calls if the snapshot was produced from multiple sources.

Source

pub fn tool_call_imbalance(&self) -> f64

Return the ratio of the most-called tool’s count to the least-called tool’s count.

Returns 1.0 when fewer than two tools are tracked (no imbalance measurable) or when the minimum is zero. A high ratio indicates that load is concentrated on a single tool.

Source

pub fn failed_tool_ratio_for(&self, name: &str) -> f64

Return the failure rate for a specific tool (failures / calls).

Returns 0.0 if the tool has no recorded calls.

Source

pub fn backpressure_shed_rate(&self) -> f64

Return the ratio of backpressure-shed events to total tool calls.

Returns 0.0 if no tool calls have been recorded.

Source

pub fn total_agent_count(&self) -> usize

Return the number of distinct agents that have recorded tool-call data.

Source

pub fn steps_per_tool_call(&self) -> f64

Return the ratio of total steps to total tool calls.

Returns 0.0 if no tool calls have been recorded.

Source

pub fn agent_with_most_calls(&self) -> Option<String>

Return the agent id with the most total tool calls across all tools.

Returns None if no per-agent tool-call data has been recorded.

Source

pub fn total_tool_failures(&self) -> u64

Return the total number of tool failures summed across all tools.

This is the sum of per_tool_failures values and equals failed_tool_calls when per-tool tracking is complete. Useful for verifying that failure tracking is consistent with overall counters.

Source

pub fn least_called_tool(&self) -> Option<String>

Return the name of the tool with the fewest recorded calls.

Returns None if no tool-call data has been recorded. When multiple tools share the minimum call count, any one of them may be returned.

Source

pub fn avg_tool_calls_per_name(&self) -> f64

Return the mean number of calls per distinct tool name.

Returns 0.0 when no tool-call data has been recorded.

Source

pub fn tool_call_count_above(&self, n: u64) -> usize

Return the number of distinct tool names that have more than n recorded calls.

Returns 0 when no tool-call data has been recorded.

Source

pub fn top_n_tools_by_calls(&self, n: usize) -> Vec<(&str, u64)>

Return the top n tool names sorted by call count (descending).

Returns fewer than n entries if fewer tools have been called. Ties are broken alphabetically (ascending) for deterministic output.

Source

pub fn tool_call_ratio(&self, name: &str) -> f64

Return the fraction of total tool calls accounted for by name.

Returns 0.0 if total_tool_calls is zero or name has no recorded calls. Returns a value in [0.0, 1.0].

Source

pub fn per_tool_calls_sorted(&self) -> Vec<(String, u64)>

Return all per-tool call counts sorted by count descending.

Returns a Vec of (tool_name, count) pairs where the first entry is the most-called tool. Returns an empty Vec when no calls have been recorded. Ties are broken alphabetically (ascending).

Source

pub fn has_tool(&self, name: &str) -> bool

Return true if name appears in the per-tool call map (i.e., was called at least once), false otherwise.

Source

pub fn tool_call_share(&self, name: &str) -> f64

Return the fraction of total tool calls attributable to name.

Returns 0.0 when total_tool_calls is zero or when name has no recorded calls. The result is in [0.0, 1.0].

Source

pub fn distinct_tool_count(&self) -> usize

Return the number of distinct tool names that have at least one recorded call.

Returns 0 when no tool calls have been recorded.

Source

pub fn has_any_tool_calls(&self) -> bool

Return true if at least one tool call has been recorded.

Equivalent to self.total_tool_calls > 0, provided as a convenience predicate for guard clauses.

Source

pub fn tool_names_alphabetical(&self) -> Vec<String>

Return tool names sorted alphabetically.

Only names that appear in the per_tool_calls map are included. Returns an empty Vec when no tool calls have been recorded.

Source

pub fn avg_failures_per_tool(&self) -> f64

Return the average number of failures per distinct tool.

Computed as total recorded failures divided by the number of distinct tool names in per_tool_calls. Returns 0.0 when no tool calls have been recorded.

Source

pub fn tools_above_failure_ratio(&self, threshold: f64) -> Vec<String>

Return the names of tools whose failure ratio (failures / calls) exceeds threshold, sorted alphabetically.

Returns an empty Vec when no tool exceeds the threshold or when no tool calls have been recorded.

Source

pub fn failure_ratio_for_tool(&self, name: &str) -> f64

Return the failure ratio for a specific tool: failures / calls.

Returns 0.0 if the tool has never been called or is unknown, avoiding division-by-zero. A ratio of 1.0 means every invocation failed.

Source

pub fn any_tool_exceeds_calls(&self, threshold: u64) -> bool

Return true if any registered tool has a call count strictly above threshold.

Useful for detecting hotspot tools that may be responsible for disproportionate load.

Source

pub fn total_unique_tools(&self) -> usize

Return the number of distinct tools that have been tracked in this snapshot (i.e. tools with at least one call recorded).

Equivalent to per_tool_calls.len() but exposed as a named method for readability.

Source

pub fn tool_call_ratio_for(&self, name: &str) -> f64

Return the fraction of all tool calls that were made by the named tool.

Returns 0.0 when the tool is unknown or there have been no tool calls at all. A value of 1.0 means this tool accounts for every call.

Source

pub fn total_failures_across_all_tools(&self) -> u64

Return the sum of all per-tool failure counts across every tracked tool.

This is the total number of error observations emitted by tool handlers, regardless of which tool generated them. Returns 0 when no failures have been recorded.