pub struct Metrics { /* private fields */ }
Implementations§
Source§impl Metrics
impl Metrics
Sourcepub fn new() -> Self
pub fn new() -> Self
Create Metrics with the standard prefix defined by name_prefix::FRONTEND
or specify custom prefix via the following environment variable:
DYN_METRICS_PREFIX
: Override the default metrics prefix
The following metrics will be created with the configured prefix:
{prefix}_requests_total
- IntCounterVec for the total number of requests processed{prefix}_inflight_requests
- IntGaugeVec for the number of inflight requests{prefix}_request_duration_seconds
- HistogramVec for the duration of requests{prefix}_input_sequence_tokens
- HistogramVec for input sequence length in tokens{prefix}_output_sequence_tokens
- HistogramVec for output sequence length in tokens{prefix}_time_to_first_token_seconds
- HistogramVec for time to first token in seconds{prefix}_inter_token_latency_seconds
- HistogramVec for inter-token latency in seconds
§Model Configuration Metrics
Runtime config metrics (from ModelRuntimeConfig):
{prefix}_model_total_kv_blocks
- IntGaugeVec for total KV cache blocks available for a worker serving the model{prefix}_model_max_num_seqs
- IntGaugeVec for maximum sequences for a worker serving the model{prefix}_model_max_num_batched_tokens
- IntGaugeVec for maximum batched tokens for a worker serving the model
MDC metrics (from ModelDeploymentCard):
{prefix}_model_context_length
- IntGaugeVec for maximum context length for a worker serving the model{prefix}_model_kv_cache_block_size
- IntGaugeVec for KV cache block size for a worker serving the model{prefix}_model_migration_limit
- IntGaugeVec for request migration limit for a worker serving the model
§Runtime Config Polling Configuration
The polling behavior can be configured via environment variables:
DYN_HTTP_SVC_CONFIG_METRICS_POLL_INTERVAL_SECS
: Poll interval in seconds (must be > 0, supports fractional seconds, defaults to 8)
Metrics are never removed to preserve historical data. Runtime config and MDC metrics are updated when models are discovered and their configurations are available.
Sourcepub fn get_request_counter(
&self,
model: &str,
endpoint: &Endpoint,
request_type: &RequestType,
status: &Status,
) -> u64
pub fn get_request_counter( &self, model: &str, endpoint: &Endpoint, request_type: &RequestType, status: &Status, ) -> u64
Get the number of successful requests for the given dimensions:
- model
- endpoint (completions/chat_completions)
- request type (unary/stream)
- status (success/error)
Sourcepub fn get_inflight_count(&self, model: &str) -> i64
pub fn get_inflight_count(&self, model: &str) -> i64
Get the number if inflight requests for the given model
Sourcepub fn inc_client_disconnect(&self)
pub fn inc_client_disconnect(&self)
Increment the gauge for client disconnections
Sourcepub fn get_client_disconnect_count(&self) -> i64
pub fn get_client_disconnect_count(&self) -> i64
Get the count of client disconnections
pub fn register(&self, registry: &Registry) -> Result<(), Error>
Sourcepub fn update_runtime_config_metrics(
&self,
model_name: &str,
runtime_config: &ModelRuntimeConfig,
)
pub fn update_runtime_config_metrics( &self, model_name: &str, runtime_config: &ModelRuntimeConfig, )
Update runtime configuration metrics for a model This should be called when model runtime configuration is available or updated
Sourcepub fn update_mdc_metrics(
&self,
model_name: &str,
context_length: u32,
kv_cache_block_size: u32,
migration_limit: u32,
)
pub fn update_mdc_metrics( &self, model_name: &str, context_length: u32, kv_cache_block_size: u32, migration_limit: u32, )
Update model deployment card metrics for a model This should be called when model deployment card information is available
Sourcepub fn update_metrics_from_model_entry(&self, model_entry: &ModelEntry)
pub fn update_metrics_from_model_entry(&self, model_entry: &ModelEntry)
Update metrics from a ModelEntry This is a convenience method that extracts runtime config from a ModelEntry and updates the appropriate metrics
Sourcepub async fn update_metrics_from_model_entry_with_mdc(
&self,
model_entry: &ModelEntry,
etcd_client: &Client,
) -> Result<()>
pub async fn update_metrics_from_model_entry_with_mdc( &self, model_entry: &ModelEntry, etcd_client: &Client, ) -> Result<()>
Update metrics from a ModelEntry and its ModelDeploymentCard This updates both runtime config metrics and MDC-specific metrics
Sourcepub fn start_runtime_config_polling_task(
metrics: Arc<Self>,
manager: Arc<ModelManager>,
etcd_client: Option<Client>,
poll_interval: Duration,
cancel_token: CancellationToken,
) -> JoinHandle<()>
pub fn start_runtime_config_polling_task( metrics: Arc<Self>, manager: Arc<ModelManager>, etcd_client: Option<Client>, poll_interval: Duration, cancel_token: CancellationToken, ) -> JoinHandle<()>
Start a background task that periodically updates runtime config metrics
§Why Polling is Required
Polling is necessary because new models may come online at any time through the distributed discovery system. The ModelManager is continuously updated as workers register/deregister with etcd, and we need to periodically check for these changes to expose their metrics.
§Behavior
- Polls the ModelManager for current models and updates metrics accordingly
- Models are never removed from metrics to preserve historical data
- If multiple model instances have the same name, only the first instance’s metrics are used
- Subsequent instances with duplicate names will be skipped
§MDC (Model Deployment Card) Behavior
Currently, we don’t overwrite an MDC. The first worker to start wins, and we assume that all other workers claiming to serve that model really are using the same configuration. Later, every worker will have its own MDC, and the frontend will validate that they checksum the same. For right now, you can assume they have the same MDC, because they aren’t allowed to change it.
The task will run until the provided cancellation token is cancelled.
Sourcepub fn create_inflight_guard(
self: Arc<Self>,
model: &str,
endpoint: Endpoint,
streaming: bool,
) -> InflightGuard
pub fn create_inflight_guard( self: Arc<Self>, model: &str, endpoint: Endpoint, streaming: bool, ) -> InflightGuard
Create a new InflightGuard
for the given model and annotate if its a streaming request,
and the kind of endpoint that was hit
The InflightGuard
is an RAII object will handle incrementing the inflight gauge and
request counters.
§Metrics Distinction
This method creates an inflight guard t tracks requests actively being processed by the LLM engine.
This is distinct from HttpQueueGuard
which tracks requests from HTTP handler start until
first token generation (including prefill time). The separation allows monitoring both HTTP processing queue time
and actual LLM processing time.
Sourcepub fn create_response_collector(
self: Arc<Self>,
model: &str,
) -> ResponseMetricCollector
pub fn create_response_collector( self: Arc<Self>, model: &str, ) -> ResponseMetricCollector
Create a new ResponseMetricCollector
for collecting per-response metrics (i.e., TTFT, ITL)
Sourcepub fn create_http_queue_guard(self: Arc<Self>, model: &str) -> HttpQueueGuard
pub fn create_http_queue_guard(self: Arc<Self>, model: &str) -> HttpQueueGuard
Create a new HttpQueueGuard
for tracking HTTP processing queue
This guard tracks requests from HTTP handler start until first token generation, providing visibility into HTTP processing queue time before actual LLM processing begins.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for Metrics
impl !RefUnwindSafe for Metrics
impl Send for Metrics
impl Sync for Metrics
impl Unpin for Metrics
impl !UnwindSafe for Metrics
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> IntoRequest<T> for T
impl<T> IntoRequest<T> for T
Source§fn into_request(self) -> Request<T>
fn into_request(self) -> Request<T>
T
in a tonic::Request
Source§impl<T, U> OverflowingInto<U> for Twhere
U: OverflowingFrom<T>,
impl<T, U> OverflowingInto<U> for Twhere
U: OverflowingFrom<T>,
fn overflowing_into(self) -> (U, bool)
Source§impl<T> Paint for Twhere
T: ?Sized,
impl<T> Paint for Twhere
T: ?Sized,
Source§fn fg(&self, value: Color) -> Painted<&T>
fn fg(&self, value: Color) -> Painted<&T>
Returns a styled value derived from self
with the foreground set to
value
.
This method should be used rarely. Instead, prefer to use color-specific
builder methods like red()
and
green()
, which have the same functionality but are
pithier.
§Example
Set foreground color to white using fg()
:
use yansi::{Paint, Color};
painted.fg(Color::White);
Set foreground color to white using white()
.
use yansi::Paint;
painted.white();
Source§fn bright_black(&self) -> Painted<&T>
fn bright_black(&self) -> Painted<&T>
Source§fn bright_red(&self) -> Painted<&T>
fn bright_red(&self) -> Painted<&T>
Source§fn bright_green(&self) -> Painted<&T>
fn bright_green(&self) -> Painted<&T>
Source§fn bright_yellow(&self) -> Painted<&T>
fn bright_yellow(&self) -> Painted<&T>
Source§fn bright_blue(&self) -> Painted<&T>
fn bright_blue(&self) -> Painted<&T>
Source§fn bright_magenta(&self) -> Painted<&T>
fn bright_magenta(&self) -> Painted<&T>
Source§fn bright_cyan(&self) -> Painted<&T>
fn bright_cyan(&self) -> Painted<&T>
Source§fn bright_white(&self) -> Painted<&T>
fn bright_white(&self) -> Painted<&T>
Source§fn bg(&self, value: Color) -> Painted<&T>
fn bg(&self, value: Color) -> Painted<&T>
Returns a styled value derived from self
with the background set to
value
.
This method should be used rarely. Instead, prefer to use color-specific
builder methods like on_red()
and
on_green()
, which have the same functionality but
are pithier.
§Example
Set background color to red using fg()
:
use yansi::{Paint, Color};
painted.bg(Color::Red);
Set background color to red using on_red()
.
use yansi::Paint;
painted.on_red();
Source§fn on_primary(&self) -> Painted<&T>
fn on_primary(&self) -> Painted<&T>
Source§fn on_magenta(&self) -> Painted<&T>
fn on_magenta(&self) -> Painted<&T>
Source§fn on_bright_black(&self) -> Painted<&T>
fn on_bright_black(&self) -> Painted<&T>
Source§fn on_bright_red(&self) -> Painted<&T>
fn on_bright_red(&self) -> Painted<&T>
Source§fn on_bright_green(&self) -> Painted<&T>
fn on_bright_green(&self) -> Painted<&T>
Source§fn on_bright_yellow(&self) -> Painted<&T>
fn on_bright_yellow(&self) -> Painted<&T>
Source§fn on_bright_blue(&self) -> Painted<&T>
fn on_bright_blue(&self) -> Painted<&T>
Source§fn on_bright_magenta(&self) -> Painted<&T>
fn on_bright_magenta(&self) -> Painted<&T>
Source§fn on_bright_cyan(&self) -> Painted<&T>
fn on_bright_cyan(&self) -> Painted<&T>
Source§fn on_bright_white(&self) -> Painted<&T>
fn on_bright_white(&self) -> Painted<&T>
Source§fn attr(&self, value: Attribute) -> Painted<&T>
fn attr(&self, value: Attribute) -> Painted<&T>
Enables the styling Attribute
value
.
This method should be used rarely. Instead, prefer to use
attribute-specific builder methods like bold()
and
underline()
, which have the same functionality
but are pithier.
§Example
Make text bold using attr()
:
use yansi::{Paint, Attribute};
painted.attr(Attribute::Bold);
Make text bold using using bold()
.
use yansi::Paint;
painted.bold();
Source§fn rapid_blink(&self) -> Painted<&T>
fn rapid_blink(&self) -> Painted<&T>
Source§fn quirk(&self, value: Quirk) -> Painted<&T>
fn quirk(&self, value: Quirk) -> Painted<&T>
Enables the yansi
Quirk
value
.
This method should be used rarely. Instead, prefer to use quirk-specific
builder methods like mask()
and
wrap()
, which have the same functionality but are
pithier.
§Example
Enable wrapping using .quirk()
:
use yansi::{Paint, Quirk};
painted.quirk(Quirk::Wrap);
Enable wrapping using wrap()
.
use yansi::Paint;
painted.wrap();
Source§fn clear(&self) -> Painted<&T>
👎Deprecated since 1.0.1: renamed to resetting()
due to conflicts with Vec::clear()
.
The clear()
method will be removed in a future release.
fn clear(&self) -> Painted<&T>
resetting()
due to conflicts with Vec::clear()
.
The clear()
method will be removed in a future release.Source§fn whenever(&self, value: Condition) -> Painted<&T>
fn whenever(&self, value: Condition) -> Painted<&T>
Conditionally enable styling based on whether the Condition
value
applies. Replaces any previous condition.
See the crate level docs for more details.
§Example
Enable styling painted
only when both stdout
and stderr
are TTYs:
use yansi::{Paint, Condition};
painted.red().on_yellow().whenever(Condition::STDOUTERR_ARE_TTY);