Struct RuntimeMetrics

Source

pub struct RuntimeMetrics {
    pub active_sessions: AtomicUsize,
    pub total_sessions: AtomicU64,
    pub total_steps: AtomicU64,
    pub total_tool_calls: AtomicU64,
    pub failed_tool_calls: AtomicU64,
    pub backpressure_shed_count: AtomicU64,
    pub memory_recall_count: AtomicU64,
    pub checkpoint_errors: AtomicU64,
    pub step_latency: LatencyHistogram,
    /* private fields */
}

Expand description

Shared runtime metrics. Clone the Arc to share across threads.

Fields§

§active_sessions: AtomicUsize

Number of agent sessions currently in progress.

§total_sessions: AtomicU64

Total number of sessions started since the runtime was created.

§total_steps: AtomicU64

Total number of ReAct steps executed across all sessions.

§total_tool_calls: AtomicU64

Total number of tool calls dispatched (across all tool names).

§failed_tool_calls: AtomicU64

Total number of tool calls that returned an error observation.

§backpressure_shed_count: AtomicU64

Total number of requests shed due to backpressure.

§memory_recall_count: AtomicU64

Total number of memory recall operations.

§checkpoint_errors: AtomicU64

Total number of checkpoint failures encountered during run_agent.

§step_latency: LatencyHistogram

Per-step latency histogram.

Implementations§

Source §

impl RuntimeMetrics

Source

pub fn new() -> Arc<Self>

Allocate a new RuntimeMetrics instance wrapped in an Arc.

Source

pub fn active_sessions(&self) -> usize

Return the number of agent sessions currently in progress.

Source

pub fn total_sessions(&self) -> u64

Return the total number of sessions started since the runtime was created.

Source

pub fn avg_tool_calls_per_session(&self) -> f64

Return the average number of tool calls per completed session.

Returns 0.0 when no sessions have been recorded.

Source

pub fn total_steps(&self) -> u64

Return the total number of ReAct steps executed across all sessions.

Source

pub fn avg_steps_per_session(&self) -> f64

Return the average number of ReAct steps per completed session.

Returns 0.0 when no sessions have been recorded.

Source

pub fn total_tool_calls(&self) -> u64

Return the total number of tool calls dispatched.

Source

pub fn failed_tool_calls(&self) -> u64

Return the total number of tool calls that returned an error observation.

Source

pub fn tool_success_rate(&self) -> f64

Return the fraction of tool calls that succeeded (i.e. did not fail).

Returns 1.0 if no tool calls have been recorded yet (vacuously all succeeded) and a value in [0.0, 1.0] once calls have been made.

Source

pub fn backpressure_shed_count(&self) -> u64

Return the total number of requests shed due to backpressure.

Source

pub fn memory_recall_count(&self) -> u64

Return the total number of memory recall operations performed.

Source

pub fn checkpoint_errors(&self) -> u64

Return the total number of checkpoint failures encountered during run_agent.

Source

pub fn checkpoint_error_rate(&self) -> f64

Return the ratio of checkpoint errors to total completed sessions.

Returns 0.0 when no sessions have been recorded.

Source

pub fn p50_latency_ms(&self) -> u64

Return the median (50th-percentile) step latency in milliseconds.

Convenience shorthand for self.step_latency.p50(). Returns 0 when no step latencies have been recorded.

Source

pub fn record_tool_call(&self, tool_name: &str)

Increment the call counter for tool_name by 1.

Called automatically by the agent loop when with_metrics is configured.

Source

pub fn record_tool_failure(&self, tool_name: &str)

Increment the failure counter for tool_name by 1.

Called automatically by the agent loop when a tool returns an error.

Source

pub fn per_tool_calls_snapshot(&self) -> HashMap<String, u64>

Return a snapshot of per-tool call counts as a HashMap<tool_name, count>.

Source

pub fn per_tool_failures_snapshot(&self) -> HashMap<String, u64>

Return a snapshot of per-tool failure counts as a HashMap<tool_name, count>.

Source

pub fn record_agent_tool_call(&self, agent_id: &str, tool_name: &str)

Increment call counter for (agent_id, tool_name).

Source

pub fn record_agent_tool_failure(&self, agent_id: &str, tool_name: &str)

Increment failure counter for (agent_id, tool_name).

Source

pub fn per_agent_tool_calls_snapshot( &self, ) -> HashMap<String, HashMap<String, u64>>

Snapshot of per-agent, per-tool call counts.

Source

pub fn per_agent_tool_failures_snapshot( &self, ) -> HashMap<String, HashMap<String, u64>>

Snapshot of per-agent, per-tool failure counts.

Source

pub fn snapshot(&self) -> MetricsSnapshot

Capture a complete snapshot of all counters, including per-tool breakdowns.

This is the preferred alternative to to_snapshot — it returns a named MetricsSnapshot struct instead of an opaque tuple.

Source

pub fn record_step_latency(&self, ms: u64)

Record a step latency sample.

Source

pub fn reset(&self)

Reset all counters to zero.

Intended for testing. In production, counters are monotonically increasing.

Source

pub fn failure_rate(&self) -> f64

Return the fraction of tool calls that failed: failed / total.

Returns 0.0 if no tool calls have been recorded.

Source

pub fn success_rate(&self) -> f64

Return the fraction of tool calls that succeeded: 1.0 - failure_rate().

Returns 1.0 if no tool calls have been recorded (vacuously all succeeded).

Source

pub fn is_active(&self) -> bool

Return true if there is at least one active (in-progress) session.

Source

pub fn step_latency_p50(&self) -> u64

Return the 50th-percentile (median) step latency in milliseconds.

Delegates to LatencyHistogram::p50 on the histogram tracked by this RuntimeMetrics instance. Returns 0 if no steps have been recorded.

Source

pub fn step_latency_p99(&self) -> u64

Return the 99th-percentile step latency in milliseconds.

Delegates to LatencyHistogram::p99. Returns 0 if no steps have been recorded.

Source

pub fn step_latency_p95(&self) -> u64

Return the 95th-percentile step latency in milliseconds.

Delegates to LatencyHistogram::p95. Returns 0 if no steps have been recorded.

Source

pub fn step_latency_p75(&self) -> u64

Return the 75th-percentile step latency in milliseconds.

Delegates to LatencyHistogram::p75. Returns 0 if no steps have been recorded.

Source

pub fn step_latency_std_dev_ms(&self) -> f64

Return the standard deviation of recorded step latencies in milliseconds.

Delegates to LatencyHistogram::std_dev_ms. Returns 0.0 when fewer than two samples have been recorded.

Source

pub fn most_used_tool(&self) -> Option<String>

Return the name of the tool with the highest call count, or None if no tools have been called yet.

When multiple tools share the maximum call count, the one that sorts earliest alphabetically is returned for deterministic output.

Source

pub fn tool_call_to_failure_ratio(&self) -> f64

Return the ratio of failed tool calls to total tool calls.

Returns 0.0 when no tool calls have been recorded. Unlike the per-tool tool_failure_rate on MetricsSnapshot, this operates on the live atomic counters for the current process without snapshotting.

Source

pub fn active_session_rate(&self) -> f64

Return the fraction of all sessions that are currently active.

Computed as active_sessions / total_sessions. Returns 0.0 when no sessions have been started.

Source

pub fn memory_recall_per_session(&self) -> f64

Return the average number of memory recall operations per session.

Computed as memory_recall_count / total_sessions. Returns 0.0 when no sessions have been started.

Source

pub fn step_error_rate(&self) -> f64

Return the fraction of all ReAct steps that resulted in a tool failure.

Computed as failed_tool_calls / total_steps. Returns 0.0 when no steps have been executed.

Source

pub fn total_errors(&self) -> u64

Return the combined count of all error events: failed tool calls plus checkpoint errors.

Useful as a single “total errors” gauge for alerting.

Source

pub fn tool_names_containing(&self, substr: &str) -> Vec<String>

Return all tool names recorded in the call counter that contain substr as a substring (case-sensitive).

Returns an empty Vec when no matching tool names are found.

Source

pub fn has_failed_tools(&self) -> bool

Return true if any tool has recorded at least one failure.

A convenience shorthand for failed_tool_calls() > 0.

Source

pub fn tool_names_by_call_count(&self) -> Vec<String>

Return tool names sorted by total call count in descending order.

The highest-called tool appears first. Ties are broken alphabetically. Returns an empty Vec when no tools have been called.

Source

pub fn avg_memory_recalls_per_step(&self) -> f64

Return the average number of memory recalls per recorded step.

Computed as memory_recall_count / total_steps. Returns 0.0 when no steps have been recorded to avoid division by zero.

Source

pub fn avg_tool_failures_per_session(&self) -> f64

Return the average number of tool failures per completed session.

Computed as failed_tool_calls / total_sessions. Returns 0.0 when no sessions have been recorded to avoid division by zero.

Source

pub fn tool_calls_per_memory_recall(&self) -> f64

Return the ratio of total tool calls to total memory recalls.

Returns 0.0 when no memory recalls have been recorded to avoid division by zero.

Source

pub fn memory_recalls_per_tool_call(&self) -> f64

Return the ratio of memory recalls to total tool calls.

Returns 0.0 when no tool calls have been recorded to avoid division by zero.

Source

pub fn step_failure_rate(&self) -> f64

Return the fraction of completed steps that recorded at least one tool failure. Computed as failed_tool_calls / total_steps.

Returns 0.0 when no steps have been recorded.

Source

pub fn total_backpressure_shed_pct(&self) -> f64

Return the fraction of total tool calls that were shed due to backpressure. Computed as backpressure_shed / total_tool_calls.

Returns 0.0 when no tool calls have been made.

Source

pub fn tool_with_highest_failure_rate(&self) -> Option<String>

Return the name of the tool with the highest failure rate (failures / calls), or None when no tool has been called.

Tools with zero calls are excluded.

Source

pub fn tool_call_count_for(&self, name: &str) -> u64

Return the total number of times name has been called.

Returns 0 when the tool has never been called.

Source

pub fn top_called_tool(&self) -> Option<String>

Return the name of the most-called tool, or None if no tools have been called yet.

Source

pub fn avg_step_latency_ms(&self) -> f64

Return the average step latency in milliseconds.

Returns 0.0 when no step latencies have been recorded.

Source

pub fn distinct_tools_called(&self) -> usize

Return the number of distinct tool names that have been called at least once.

Source

pub fn failure_rate_for(&self, name: &str) -> f64

Return the failure rate (failed / total) for the given tool name.

Returns 0.0 when the tool has never been called or doesn’t exist.

Source

pub fn checkpoint_errors_count(&self) -> u64

Return the total number of checkpoint errors recorded since the runtime started.

Source

pub fn agents_with_failures(&self) -> Vec<String>

Return the names of agents that have at least one per-agent tool failure recorded.

Source

pub fn total_agent_failures(&self) -> u64

Return the total number of per-agent tool failures recorded across all agents and all tools.

Source

pub fn per_step_tool_call_rate(&self) -> f64

Return the average number of tool calls per recorded step, or 0.0 when no steps have been recorded.

Source

pub fn agents_with_no_failures(&self) -> Vec<String>

Return agent IDs that have recorded tool calls but zero failures.

Source

pub fn tools_with_calls_above(&self, threshold: u64) -> Vec<String>

Return a sorted list of tool names whose total call count exceeds threshold.

Useful for identifying heavily-exercised tools above a given activity level. Returns an empty Vec when no tool meets the criterion.

Source

pub fn agent_tool_call_count(&self, agent_id: &str) -> u64

Return the total number of tool calls recorded for the given agent_id.

Returns 0 when the agent has never called a tool.

Source

pub fn tool_calls_per_session(&self) -> f64

Return the average number of tool calls per total session.

Returns 0.0 when no sessions have been started.

Source

pub fn failure_free_tools(&self) -> Vec<String>

Return the names of all tools that have been called at least once but have recorded zero failures.

Source

pub fn top_tools_by_calls(&self, n: usize) -> Vec<(String, u64)>

Return the top n tools by total call count, sorted descending.

Returns fewer than n entries if fewer tools have been called.

Source

pub fn top_tools_by_failures(&self, n: usize) -> Vec<(String, u64)>

Return the top n tools by total failure count, sorted descending.

Analogous to top_tools_by_calls; returns fewer than n entries if fewer tools have recorded failures.

Source

pub fn total_step_latency_ms(&self) -> u64

Return the sum of all recorded step latencies in milliseconds.

Source

pub fn avg_calls_per_step(&self) -> f64

Return the average number of tool calls per recorded step.

Returns 0.0 when no steps have been recorded to avoid division by zero.

Source

pub fn memory_pressure_ratio(&self) -> f64

Return the ratio of memory recall events to total steps recorded.

Indicates how memory-intensive the agent’s operation is. Returns 0.0 when no steps have been recorded to avoid division by zero.

Source

pub fn backpressure_ratio(&self) -> f64

Return the ratio of backpressure-shed events to total steps recorded.

Higher values indicate significant load shedding. Returns 0.0 when no steps have been recorded to avoid division by zero.

Source

pub fn sessions_per_step(&self) -> f64

Return the ratio of total sessions to total steps recorded.

Higher values indicate shorter average sessions. Returns 0.0 when no steps have been recorded to avoid division by zero.

Source

pub fn has_latency_data(&self) -> bool

Return true if any step-latency samples have been recorded.

Useful for guard-checking before using latency percentile methods.

Source

pub fn global_failure_rate(&self) -> f64

Return the ratio of failed_tool_calls to total_tool_calls.

Returns 0.0 when no tool calls have been recorded (avoids division-by-zero).

Source

pub fn total_agent_tool_calls(&self) -> u64

Return the total number of tool calls recorded across all agents in the per-agent breakdown.

This sums the per-agent, per-tool call counters and is independent of the global total_tool_calls counter, which is incremented by a different code path.

Source

pub fn agent_tool_count(&self) -> usize

Return the number of distinct agents recorded in the per-agent tool call tracking.

Returns 0 when no per-agent calls have been recorded.

Source

pub fn has_recorded_agent_calls(&self) -> bool

Return true if any per-agent tool call has been recorded.

A lighter alternative to checking agent_tool_count() > 0; avoids building the full per-agent snapshot map when a boolean answer suffices.

Source

pub fn active_session_count(&self) -> usize

Return the current count of active (in-progress) sessions.

Source

pub fn memory_to_session_ratio(&self) -> f64

Return the ratio of memory_recall_count to total_sessions.

Returns 0.0 when no sessions have been recorded (avoids division-by-zero).

Source

pub fn total_latency_per_session(&self) -> f64

Return the total accumulated step latency in milliseconds divided by total_sessions.

Returns 0.0 when no sessions have been recorded.

Source

pub fn to_snapshot(&self) -> (usize, u64, u64, u64, u64, u64, u64)

👎Deprecated since 1.0.3:

use snapshot() which returns the named MetricsSnapshot struct

Capture a snapshot of global counters as plain integers.

Returns (active_sessions, total_sessions, total_steps, total_tool_calls, failed_tool_calls, backpressure_shed_count, memory_recall_count). For per-tool breakdowns use per_tool_calls_snapshot and per_tool_failures_snapshot.

§Deprecation

Prefer snapshot which returns the named MetricsSnapshot struct and includes per-tool, per-agent, and histogram data. This method returns an anonymous tuple whose field order is easy to misread.