pub struct LatencyStats {Show 19 fields
pub model: Option<String>,
pub count: usize,
pub avg_ms: f64,
pub p50_ms: i64,
pub p95_ms: i64,
pub p99_ms: i64,
pub ttft_count: usize,
pub ttft_p50_ms: Option<i64>,
pub ttft_p95_ms: Option<i64>,
pub ttft_p99_ms: Option<i64>,
pub derived_tokens_per_sec_p50: Option<f64>,
pub derived_tokens_per_sec_p95: Option<f64>,
pub derived_tokens_per_sec_p99: Option<f64>,
pub input_tokens_p50: Option<i64>,
pub input_tokens_p95: Option<i64>,
pub input_tokens_p99: Option<i64>,
pub output_input_ratio_p50: Option<f64>,
pub output_input_ratio_p95: Option<f64>,
pub output_input_ratio_p99: Option<f64>,
}Expand description
Latency / TTFT percentile statistics for LLM spans, grouped by model.
Fields§
§model: Option<String>§count: usize§avg_ms: f64§p50_ms: i64§p95_ms: i64§p99_ms: i64§ttft_count: usizeTTFT is reported only when any span in the group carried a ttft attribute.
ttft_p50_ms: Option<i64>§ttft_p95_ms: Option<i64>§ttft_p99_ms: Option<i64>§derived_tokens_per_sec_p50: Option<f64>Derived output token throughput (output_tokens / span_duration_sec). Span duration includes network + queue time, NOT pure generation time. Only set for spans where both output_tokens > 0 and duration > 0.
derived_tokens_per_sec_p95: Option<f64>§derived_tokens_per_sec_p99: Option<f64>§input_tokens_p50: Option<i64>Distribution of input token counts (context / prompt size).
input_tokens_p95: Option<i64>§input_tokens_p99: Option<i64>§output_input_ratio_p50: Option<f64>Distribution of output/input token ratio (generation verbosity).
output_input_ratio_p95: Option<f64>§output_input_ratio_p99: Option<f64>Trait Implementations§
Source§impl Clone for LatencyStats
impl Clone for LatencyStats
Source§fn clone(&self) -> LatencyStats
fn clone(&self) -> LatencyStats
Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreSource§impl Debug for LatencyStats
impl Debug for LatencyStats
Source§impl<'de> Deserialize<'de> for LatencyStats
impl<'de> Deserialize<'de> for LatencyStats
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Deserialize this value from the given Serde deserializer. Read more
Auto Trait Implementations§
impl Freeze for LatencyStats
impl RefUnwindSafe for LatencyStats
impl Send for LatencyStats
impl Sync for LatencyStats
impl Unpin for LatencyStats
impl UnsafeUnpin for LatencyStats
impl UnwindSafe for LatencyStats
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more