Expand description
Metrics collection and aggregation for LLM Latency Lens
This crate provides production-ready metrics collection and statistical aggregation for LLM performance measurement. It uses HDR Histogram for accurate percentile calculations and provides thread-safe collectors.
§Features
- High-precision percentile tracking using HDR Histogram
- Thread-safe collectors with Arc<Mutex<>> for concurrent access
- Efficient memory usage with configurable histogram parameters
- Comprehensive metrics:
- TTFT (Time to First Token)
- Inter-token latency
- Total request latency
- Token throughput
- Cost tracking
- Statistical aggregation with p50, p90, p95, p99, p99.9 percentiles
- Serde serialization for all types
§Example Usage
use llm_latency_lens_metrics::{
MetricsCollector, MetricsAggregator, CollectorConfig, RequestMetrics
};
use llm_latency_lens_core::{SessionId, RequestId, Provider};
use chrono::Utc;
use std::time::Duration;
// Create a metrics collector
let session_id = SessionId::new();
let config = CollectorConfig::new()
.with_max_value_seconds(60)
.with_significant_digits(3);
let collector = MetricsCollector::new(session_id, config).unwrap();
// Record metrics from requests
let metrics = RequestMetrics {
request_id: RequestId::new(),
session_id,
provider: Provider::OpenAI,
model: "gpt-4".to_string(),
timestamp: Utc::now(),
ttft: Duration::from_millis(150),
total_latency: Duration::from_millis(2000),
inter_token_latencies: vec![
Duration::from_millis(10),
Duration::from_millis(15),
Duration::from_millis(12),
],
input_tokens: 100,
output_tokens: 50,
thinking_tokens: None,
tokens_per_second: 25.0,
cost_usd: Some(0.05),
success: true,
error: None,
};
collector.record(metrics).unwrap();
// Aggregate metrics
let aggregated = MetricsAggregator::aggregate(&collector).unwrap();
// Access statistical distributions
println!("TTFT p50: {:?}", aggregated.ttft_distribution.p50);
println!("TTFT p99: {:?}", aggregated.ttft_distribution.p99);
println!("Success rate: {:.2}%", aggregated.success_rate());
println!("Mean throughput: {:.2} tokens/sec",
aggregated.throughput.mean_tokens_per_second);§Thread Safety
The MetricsCollector is thread-safe and can be shared across multiple
threads using Arc:
use llm_latency_lens_metrics::MetricsCollector;
use llm_latency_lens_core::SessionId;
use std::sync::Arc;
use std::thread;
let collector = Arc::new(
MetricsCollector::with_defaults(SessionId::new()).unwrap()
);
let mut handles = vec![];
for _ in 0..10 {
let collector_clone = Arc::clone(&collector);
let handle = thread::spawn(move || {
// Record metrics from this thread
// collector_clone.record(...).unwrap();
});
handles.push(handle);
}
for handle in handles {
handle.join().unwrap();
}§Performance Characteristics
- Recording overhead: ~1-2μs per metric
- Memory usage: ~100KB per 10,000 samples (with default config)
- Aggregation time: ~100μs for 10,000 samples
- Percentile accuracy: 0.1% (with 3 significant digits)
§Configuration
The collector can be configured with:
- Maximum value: The highest value that can be tracked (default: 60 seconds)
- Significant digits: Precision of percentile calculations (1-5, default: 3)
- Per-provider tracking: Enable/disable separate histograms per provider
- Per-model tracking: Enable/disable separate histograms per model
Higher precision and longer tracking ranges increase memory usage.
Re-exports§
pub use aggregator::DistributionChange;pub use aggregator::MetricsAggregator;pub use aggregator::MetricsComparison;pub use collector::CollectorConfig;pub use collector::MetricsCollector;pub use collector::MetricsError;pub use types::AggregatedMetrics;pub use types::LatencyDistribution;pub use types::RequestMetrics;pub use types::ThroughputStats;
Modules§
- aggregator
- Statistical aggregation of collected metrics
- collector
- Metrics collector using HDR Histogram for accurate percentile calculation
- types
- Metrics data structures for LLM Latency Lens
Structs§
Enums§
- Provider
- LLM provider type