pub struct RuntimeMetrics {
pub active_sessions: AtomicUsize,
pub total_sessions: AtomicU64,
pub total_steps: AtomicU64,
pub total_tool_calls: AtomicU64,
pub failed_tool_calls: AtomicU64,
pub backpressure_shed_count: AtomicU64,
pub memory_recall_count: AtomicU64,
pub checkpoint_errors: AtomicU64,
pub step_latency: LatencyHistogram,
/* private fields */
}Expand description
Shared runtime metrics. Clone the Arc to share across threads.
Fields§
§active_sessions: AtomicUsizeNumber of agent sessions currently in progress.
total_sessions: AtomicU64Total number of sessions started since the runtime was created.
total_steps: AtomicU64Total number of ReAct steps executed across all sessions.
total_tool_calls: AtomicU64Total number of tool calls dispatched (across all tool names).
failed_tool_calls: AtomicU64Total number of tool calls that returned an error observation.
backpressure_shed_count: AtomicU64Total number of requests shed due to backpressure.
memory_recall_count: AtomicU64Total number of memory recall operations.
checkpoint_errors: AtomicU64Total number of checkpoint failures encountered during run_agent.
step_latency: LatencyHistogramPer-step latency histogram.
Implementations§
Source§impl RuntimeMetrics
impl RuntimeMetrics
Sourcepub fn active_sessions(&self) -> usize
pub fn active_sessions(&self) -> usize
Return the number of agent sessions currently in progress.
Sourcepub fn total_sessions(&self) -> u64
pub fn total_sessions(&self) -> u64
Return the total number of sessions started since the runtime was created.
Sourcepub fn avg_tool_calls_per_session(&self) -> f64
pub fn avg_tool_calls_per_session(&self) -> f64
Return the average number of tool calls per completed session.
Returns 0.0 when no sessions have been recorded.
Sourcepub fn total_steps(&self) -> u64
pub fn total_steps(&self) -> u64
Return the total number of ReAct steps executed across all sessions.
Sourcepub fn avg_steps_per_session(&self) -> f64
pub fn avg_steps_per_session(&self) -> f64
Return the average number of ReAct steps per completed session.
Returns 0.0 when no sessions have been recorded.
Sourcepub fn total_tool_calls(&self) -> u64
pub fn total_tool_calls(&self) -> u64
Return the total number of tool calls dispatched.
Sourcepub fn failed_tool_calls(&self) -> u64
pub fn failed_tool_calls(&self) -> u64
Return the total number of tool calls that returned an error observation.
Sourcepub fn tool_success_rate(&self) -> f64
pub fn tool_success_rate(&self) -> f64
Return the fraction of tool calls that succeeded (i.e. did not fail).
Returns 1.0 if no tool calls have been recorded yet (vacuously all
succeeded) and a value in [0.0, 1.0] once calls have been made.
Sourcepub fn backpressure_shed_count(&self) -> u64
pub fn backpressure_shed_count(&self) -> u64
Return the total number of requests shed due to backpressure.
Sourcepub fn memory_recall_count(&self) -> u64
pub fn memory_recall_count(&self) -> u64
Return the total number of memory recall operations performed.
Sourcepub fn checkpoint_errors(&self) -> u64
pub fn checkpoint_errors(&self) -> u64
Return the total number of checkpoint failures encountered during run_agent.
Sourcepub fn checkpoint_error_rate(&self) -> f64
pub fn checkpoint_error_rate(&self) -> f64
Return the ratio of checkpoint errors to total completed sessions.
Returns 0.0 when no sessions have been recorded.
Sourcepub fn p50_latency_ms(&self) -> u64
pub fn p50_latency_ms(&self) -> u64
Return the median (50th-percentile) step latency in milliseconds.
Convenience shorthand for self.step_latency.p50(). Returns 0
when no step latencies have been recorded.
Sourcepub fn record_tool_call(&self, tool_name: &str)
pub fn record_tool_call(&self, tool_name: &str)
Increment the call counter for tool_name by 1.
Called automatically by the agent loop when with_metrics is configured.
Sourcepub fn record_tool_failure(&self, tool_name: &str)
pub fn record_tool_failure(&self, tool_name: &str)
Increment the failure counter for tool_name by 1.
Called automatically by the agent loop when a tool returns an error.
Sourcepub fn per_tool_calls_snapshot(&self) -> HashMap<String, u64>
pub fn per_tool_calls_snapshot(&self) -> HashMap<String, u64>
Return a snapshot of per-tool call counts as a HashMap<tool_name, count>.
Sourcepub fn per_tool_failures_snapshot(&self) -> HashMap<String, u64>
pub fn per_tool_failures_snapshot(&self) -> HashMap<String, u64>
Return a snapshot of per-tool failure counts as a HashMap<tool_name, count>.
Sourcepub fn record_agent_tool_call(&self, agent_id: &str, tool_name: &str)
pub fn record_agent_tool_call(&self, agent_id: &str, tool_name: &str)
Increment call counter for (agent_id, tool_name).
Sourcepub fn record_agent_tool_failure(&self, agent_id: &str, tool_name: &str)
pub fn record_agent_tool_failure(&self, agent_id: &str, tool_name: &str)
Increment failure counter for (agent_id, tool_name).
Sourcepub fn per_agent_tool_calls_snapshot(
&self,
) -> HashMap<String, HashMap<String, u64>>
pub fn per_agent_tool_calls_snapshot( &self, ) -> HashMap<String, HashMap<String, u64>>
Snapshot of per-agent, per-tool call counts.
Sourcepub fn per_agent_tool_failures_snapshot(
&self,
) -> HashMap<String, HashMap<String, u64>>
pub fn per_agent_tool_failures_snapshot( &self, ) -> HashMap<String, HashMap<String, u64>>
Snapshot of per-agent, per-tool failure counts.
Sourcepub fn snapshot(&self) -> MetricsSnapshot
pub fn snapshot(&self) -> MetricsSnapshot
Capture a complete snapshot of all counters, including per-tool breakdowns.
This is the preferred alternative to to_snapshot — it returns a
named MetricsSnapshot struct instead of an opaque tuple.
Sourcepub fn record_step_latency(&self, ms: u64)
pub fn record_step_latency(&self, ms: u64)
Record a step latency sample.
Sourcepub fn reset(&self)
pub fn reset(&self)
Reset all counters to zero.
Intended for testing. In production, counters are monotonically increasing.
Sourcepub fn failure_rate(&self) -> f64
pub fn failure_rate(&self) -> f64
Return the fraction of tool calls that failed: failed / total.
Returns 0.0 if no tool calls have been recorded.
Sourcepub fn success_rate(&self) -> f64
pub fn success_rate(&self) -> f64
Return the fraction of tool calls that succeeded: 1.0 - failure_rate().
Returns 1.0 if no tool calls have been recorded (vacuously all succeeded).
Sourcepub fn is_active(&self) -> bool
pub fn is_active(&self) -> bool
Return true if there is at least one active (in-progress) session.
Sourcepub fn step_latency_p50(&self) -> u64
pub fn step_latency_p50(&self) -> u64
Return the 50th-percentile (median) step latency in milliseconds.
Delegates to LatencyHistogram::p50 on the histogram tracked by
this RuntimeMetrics instance. Returns 0 if no steps have been recorded.
Sourcepub fn step_latency_p99(&self) -> u64
pub fn step_latency_p99(&self) -> u64
Return the 99th-percentile step latency in milliseconds.
Delegates to LatencyHistogram::p99. Returns 0 if no steps have
been recorded.
Sourcepub fn step_latency_p95(&self) -> u64
pub fn step_latency_p95(&self) -> u64
Return the 95th-percentile step latency in milliseconds.
Delegates to LatencyHistogram::p95. Returns 0 if no steps have
been recorded.
Sourcepub fn step_latency_p75(&self) -> u64
pub fn step_latency_p75(&self) -> u64
Return the 75th-percentile step latency in milliseconds.
Delegates to LatencyHistogram::p75. Returns 0 if no steps have
been recorded.
Sourcepub fn step_latency_std_dev_ms(&self) -> f64
pub fn step_latency_std_dev_ms(&self) -> f64
Return the standard deviation of recorded step latencies in milliseconds.
Delegates to LatencyHistogram::std_dev_ms. Returns 0.0 when fewer
than two samples have been recorded.
Sourcepub fn most_used_tool(&self) -> Option<String>
pub fn most_used_tool(&self) -> Option<String>
Return the name of the tool with the highest call count, or None if no
tools have been called yet.
When multiple tools share the maximum call count, the one that sorts earliest alphabetically is returned for deterministic output.
Sourcepub fn tool_call_to_failure_ratio(&self) -> f64
pub fn tool_call_to_failure_ratio(&self) -> f64
Return the ratio of failed tool calls to total tool calls.
Returns 0.0 when no tool calls have been recorded. Unlike the
per-tool tool_failure_rate on MetricsSnapshot, this operates on
the live atomic counters for the current process without snapshotting.
Sourcepub fn active_session_rate(&self) -> f64
pub fn active_session_rate(&self) -> f64
Return the fraction of all sessions that are currently active.
Computed as active_sessions / total_sessions. Returns 0.0 when no
sessions have been started.
Sourcepub fn memory_recall_per_session(&self) -> f64
pub fn memory_recall_per_session(&self) -> f64
Return the average number of memory recall operations per session.
Computed as memory_recall_count / total_sessions. Returns 0.0
when no sessions have been started.
Sourcepub fn step_error_rate(&self) -> f64
pub fn step_error_rate(&self) -> f64
Return the fraction of all ReAct steps that resulted in a tool failure.
Computed as failed_tool_calls / total_steps. Returns 0.0 when
no steps have been executed.
Sourcepub fn total_errors(&self) -> u64
pub fn total_errors(&self) -> u64
Return the combined count of all error events: failed tool calls plus checkpoint errors.
Useful as a single “total errors” gauge for alerting.
Sourcepub fn tool_names_containing(&self, substr: &str) -> Vec<String>
pub fn tool_names_containing(&self, substr: &str) -> Vec<String>
Return all tool names recorded in the call counter that contain
substr as a substring (case-sensitive).
Returns an empty Vec when no matching tool names are found.
Sourcepub fn has_failed_tools(&self) -> bool
pub fn has_failed_tools(&self) -> bool
Return true if any tool has recorded at least one failure.
A convenience shorthand for failed_tool_calls() > 0.
Sourcepub fn tool_names_by_call_count(&self) -> Vec<String>
pub fn tool_names_by_call_count(&self) -> Vec<String>
Return tool names sorted by total call count in descending order.
The highest-called tool appears first. Ties are broken alphabetically.
Returns an empty Vec when no tools have been called.
Sourcepub fn avg_memory_recalls_per_step(&self) -> f64
pub fn avg_memory_recalls_per_step(&self) -> f64
Return the average number of memory recalls per recorded step.
Computed as memory_recall_count / total_steps. Returns 0.0
when no steps have been recorded to avoid division by zero.
Sourcepub fn avg_tool_failures_per_session(&self) -> f64
pub fn avg_tool_failures_per_session(&self) -> f64
Return the average number of tool failures per completed session.
Computed as failed_tool_calls / total_sessions. Returns 0.0
when no sessions have been recorded to avoid division by zero.
Sourcepub fn tool_calls_per_memory_recall(&self) -> f64
pub fn tool_calls_per_memory_recall(&self) -> f64
Return the ratio of total tool calls to total memory recalls.
Returns 0.0 when no memory recalls have been recorded to avoid
division by zero.
Sourcepub fn memory_recalls_per_tool_call(&self) -> f64
pub fn memory_recalls_per_tool_call(&self) -> f64
Return the ratio of memory recalls to total tool calls.
Returns 0.0 when no tool calls have been recorded to avoid division
by zero.
Sourcepub fn step_failure_rate(&self) -> f64
pub fn step_failure_rate(&self) -> f64
Return the fraction of completed steps that recorded at least one tool
failure. Computed as failed_tool_calls / total_steps.
Returns 0.0 when no steps have been recorded.
Sourcepub fn total_backpressure_shed_pct(&self) -> f64
pub fn total_backpressure_shed_pct(&self) -> f64
Return the fraction of total tool calls that were shed due to
backpressure. Computed as backpressure_shed / total_tool_calls.
Returns 0.0 when no tool calls have been made.
Sourcepub fn tool_with_highest_failure_rate(&self) -> Option<String>
pub fn tool_with_highest_failure_rate(&self) -> Option<String>
Return the name of the tool with the highest failure rate
(failures / calls), or None when no tool has been called.
Tools with zero calls are excluded.
Sourcepub fn tool_call_count_for(&self, name: &str) -> u64
pub fn tool_call_count_for(&self, name: &str) -> u64
Return the total number of times name has been called.
Returns 0 when the tool has never been called.
Sourcepub fn top_called_tool(&self) -> Option<String>
pub fn top_called_tool(&self) -> Option<String>
Return the name of the most-called tool, or None if no tools have
been called yet.
Sourcepub fn avg_step_latency_ms(&self) -> f64
pub fn avg_step_latency_ms(&self) -> f64
Return the average step latency in milliseconds.
Returns 0.0 when no step latencies have been recorded.
Sourcepub fn distinct_tools_called(&self) -> usize
pub fn distinct_tools_called(&self) -> usize
Return the number of distinct tool names that have been called at least once.
Sourcepub fn failure_rate_for(&self, name: &str) -> f64
pub fn failure_rate_for(&self, name: &str) -> f64
Return the failure rate (failed / total) for the given tool name.
Returns 0.0 when the tool has never been called or doesn’t exist.
Sourcepub fn checkpoint_errors_count(&self) -> u64
pub fn checkpoint_errors_count(&self) -> u64
Return the total number of checkpoint errors recorded since the runtime started.
Sourcepub fn agents_with_failures(&self) -> Vec<String>
pub fn agents_with_failures(&self) -> Vec<String>
Return the names of agents that have at least one per-agent tool failure recorded.
Sourcepub fn total_agent_failures(&self) -> u64
pub fn total_agent_failures(&self) -> u64
Return the total number of per-agent tool failures recorded across all agents and all tools.
Sourcepub fn per_step_tool_call_rate(&self) -> f64
pub fn per_step_tool_call_rate(&self) -> f64
Return the average number of tool calls per recorded step, or 0.0
when no steps have been recorded.
Sourcepub fn agents_with_no_failures(&self) -> Vec<String>
pub fn agents_with_no_failures(&self) -> Vec<String>
Return agent IDs that have recorded tool calls but zero failures.
Sourcepub fn tools_with_calls_above(&self, threshold: u64) -> Vec<String>
pub fn tools_with_calls_above(&self, threshold: u64) -> Vec<String>
Return a sorted list of tool names whose total call count exceeds
threshold.
Useful for identifying heavily-exercised tools above a given activity
level. Returns an empty Vec when no tool meets the criterion.
Sourcepub fn agent_tool_call_count(&self, agent_id: &str) -> u64
pub fn agent_tool_call_count(&self, agent_id: &str) -> u64
Return the total number of tool calls recorded for the given agent_id.
Returns 0 when the agent has never called a tool.
Sourcepub fn tool_calls_per_session(&self) -> f64
pub fn tool_calls_per_session(&self) -> f64
Return the average number of tool calls per total session.
Returns 0.0 when no sessions have been started.
Sourcepub fn failure_free_tools(&self) -> Vec<String>
pub fn failure_free_tools(&self) -> Vec<String>
Return the names of all tools that have been called at least once but have recorded zero failures.
Sourcepub fn top_tools_by_calls(&self, n: usize) -> Vec<(String, u64)>
pub fn top_tools_by_calls(&self, n: usize) -> Vec<(String, u64)>
Return the top n tools by total call count, sorted descending.
Returns fewer than n entries if fewer tools have been called.
Sourcepub fn top_tools_by_failures(&self, n: usize) -> Vec<(String, u64)>
pub fn top_tools_by_failures(&self, n: usize) -> Vec<(String, u64)>
Return the top n tools by total failure count, sorted descending.
Analogous to top_tools_by_calls; returns fewer than n entries if
fewer tools have recorded failures.
Sourcepub fn total_step_latency_ms(&self) -> u64
pub fn total_step_latency_ms(&self) -> u64
Return the sum of all recorded step latencies in milliseconds.
Sourcepub fn avg_calls_per_step(&self) -> f64
pub fn avg_calls_per_step(&self) -> f64
Return the average number of tool calls per recorded step.
Returns 0.0 when no steps have been recorded to avoid division by
zero.
Sourcepub fn memory_pressure_ratio(&self) -> f64
pub fn memory_pressure_ratio(&self) -> f64
Return the ratio of memory recall events to total steps recorded.
Indicates how memory-intensive the agent’s operation is. Returns 0.0
when no steps have been recorded to avoid division by zero.
Sourcepub fn backpressure_ratio(&self) -> f64
pub fn backpressure_ratio(&self) -> f64
Return the ratio of backpressure-shed events to total steps recorded.
Higher values indicate significant load shedding. Returns 0.0 when no
steps have been recorded to avoid division by zero.
Sourcepub fn sessions_per_step(&self) -> f64
pub fn sessions_per_step(&self) -> f64
Return the ratio of total sessions to total steps recorded.
Higher values indicate shorter average sessions. Returns 0.0 when no
steps have been recorded to avoid division by zero.
Sourcepub fn has_latency_data(&self) -> bool
pub fn has_latency_data(&self) -> bool
Return true if any step-latency samples have been recorded.
Useful for guard-checking before using latency percentile methods.
Sourcepub fn global_failure_rate(&self) -> f64
pub fn global_failure_rate(&self) -> f64
Return the ratio of failed_tool_calls to total_tool_calls.
Returns 0.0 when no tool calls have been recorded (avoids
division-by-zero).
Sourcepub fn total_agent_tool_calls(&self) -> u64
pub fn total_agent_tool_calls(&self) -> u64
Return the total number of tool calls recorded across all agents in the per-agent breakdown.
This sums the per-agent, per-tool call counters and is independent of
the global total_tool_calls counter, which is incremented by a
different code path.
Sourcepub fn agent_tool_count(&self) -> usize
pub fn agent_tool_count(&self) -> usize
Return the number of distinct agents recorded in the per-agent tool call tracking.
Returns 0 when no per-agent calls have been recorded.
Sourcepub fn has_recorded_agent_calls(&self) -> bool
pub fn has_recorded_agent_calls(&self) -> bool
Return true if any per-agent tool call has been recorded.
A lighter alternative to checking agent_tool_count() > 0; avoids
building the full per-agent snapshot map when a boolean answer suffices.
Sourcepub fn active_session_count(&self) -> usize
pub fn active_session_count(&self) -> usize
Return the current count of active (in-progress) sessions.
Sourcepub fn memory_to_session_ratio(&self) -> f64
pub fn memory_to_session_ratio(&self) -> f64
Return the ratio of memory_recall_count to total_sessions.
Returns 0.0 when no sessions have been recorded (avoids
division-by-zero).
Sourcepub fn total_latency_per_session(&self) -> f64
pub fn total_latency_per_session(&self) -> f64
Return the total accumulated step latency in milliseconds divided by
total_sessions.
Returns 0.0 when no sessions have been recorded.
Sourcepub fn to_snapshot(&self) -> (usize, u64, u64, u64, u64, u64, u64)
👎Deprecated since 1.0.3: use snapshot() which returns the named MetricsSnapshot struct
pub fn to_snapshot(&self) -> (usize, u64, u64, u64, u64, u64, u64)
use snapshot() which returns the named MetricsSnapshot struct
Capture a snapshot of global counters as plain integers.
Returns (active_sessions, total_sessions, total_steps, total_tool_calls, failed_tool_calls, backpressure_shed_count, memory_recall_count).
For per-tool breakdowns use per_tool_calls_snapshot and
per_tool_failures_snapshot.
§Deprecation
Prefer snapshot which returns the named MetricsSnapshot struct
and includes per-tool, per-agent, and histogram data. This method
returns an anonymous tuple whose field order is easy to misread.