Crate llm_latency_lens_metrics

Crate llm_latency_lens_metrics 

Source
Expand description

Metrics collection and aggregation for LLM Latency Lens

This crate provides production-ready metrics collection and statistical aggregation for LLM performance measurement. It uses HDR Histogram for accurate percentile calculations and provides thread-safe collectors.

§Features

  • High-precision percentile tracking using HDR Histogram
  • Thread-safe collectors with Arc<Mutex<>> for concurrent access
  • Efficient memory usage with configurable histogram parameters
  • Comprehensive metrics:
    • TTFT (Time to First Token)
    • Inter-token latency
    • Total request latency
    • Token throughput
    • Cost tracking
  • Statistical aggregation with p50, p90, p95, p99, p99.9 percentiles
  • Serde serialization for all types

§Example Usage

use llm_latency_lens_metrics::{
    MetricsCollector, MetricsAggregator, CollectorConfig, RequestMetrics
};
use llm_latency_lens_core::{SessionId, RequestId, Provider};
use chrono::Utc;
use std::time::Duration;

// Create a metrics collector
let session_id = SessionId::new();
let config = CollectorConfig::new()
    .with_max_value_seconds(60)
    .with_significant_digits(3);

let collector = MetricsCollector::new(session_id, config).unwrap();

// Record metrics from requests
let metrics = RequestMetrics {
    request_id: RequestId::new(),
    session_id,
    provider: Provider::OpenAI,
    model: "gpt-4".to_string(),
    timestamp: Utc::now(),
    ttft: Duration::from_millis(150),
    total_latency: Duration::from_millis(2000),
    inter_token_latencies: vec![
        Duration::from_millis(10),
        Duration::from_millis(15),
        Duration::from_millis(12),
    ],
    input_tokens: 100,
    output_tokens: 50,
    thinking_tokens: None,
    tokens_per_second: 25.0,
    cost_usd: Some(0.05),
    success: true,
    error: None,
};

collector.record(metrics).unwrap();

// Aggregate metrics
let aggregated = MetricsAggregator::aggregate(&collector).unwrap();

// Access statistical distributions
println!("TTFT p50: {:?}", aggregated.ttft_distribution.p50);
println!("TTFT p99: {:?}", aggregated.ttft_distribution.p99);
println!("Success rate: {:.2}%", aggregated.success_rate());
println!("Mean throughput: {:.2} tokens/sec",
         aggregated.throughput.mean_tokens_per_second);

§Thread Safety

The MetricsCollector is thread-safe and can be shared across multiple threads using Arc:

use llm_latency_lens_metrics::MetricsCollector;
use llm_latency_lens_core::SessionId;
use std::sync::Arc;
use std::thread;

let collector = Arc::new(
    MetricsCollector::with_defaults(SessionId::new()).unwrap()
);

let mut handles = vec![];
for _ in 0..10 {
    let collector_clone = Arc::clone(&collector);
    let handle = thread::spawn(move || {
        // Record metrics from this thread
        // collector_clone.record(...).unwrap();
    });
    handles.push(handle);
}

for handle in handles {
    handle.join().unwrap();
}

§Performance Characteristics

  • Recording overhead: ~1-2μs per metric
  • Memory usage: ~100KB per 10,000 samples (with default config)
  • Aggregation time: ~100μs for 10,000 samples
  • Percentile accuracy: 0.1% (with 3 significant digits)

§Configuration

The collector can be configured with:

  • Maximum value: The highest value that can be tracked (default: 60 seconds)
  • Significant digits: Precision of percentile calculations (1-5, default: 3)
  • Per-provider tracking: Enable/disable separate histograms per provider
  • Per-model tracking: Enable/disable separate histograms per model

Higher precision and longer tracking ranges increase memory usage.

Re-exports§

pub use aggregator::DistributionChange;
pub use aggregator::MetricsAggregator;
pub use aggregator::MetricsComparison;
pub use collector::CollectorConfig;
pub use collector::MetricsCollector;
pub use collector::MetricsError;
pub use types::AggregatedMetrics;
pub use types::LatencyDistribution;
pub use types::RequestMetrics;
pub use types::ThroughputStats;

Modules§

aggregator
Statistical aggregation of collected metrics
collector
Metrics collector using HDR Histogram for accurate percentile calculation
types
Metrics data structures for LLM Latency Lens

Structs§

RequestId
Unique identifier for a single request
SessionId
Unique identifier for a profiling session

Enums§

Provider
LLM provider type