Expand description
LLM Evaluation Metrics Module (#71)
Direct observation of LLM behavior through comprehensive metrics tracking.
§Toyota Way: 現地現物 (Genchi Genbutsu)
“Go and see” - Direct observation of LLM behavior through metrics enables data-driven decisions about prompt engineering and model selection.
§Example
ⓘ
use entrenar::monitor::llm::{LLMMetrics, PromptVersion, EvalResult, InMemoryLLMEvaluator};
let mut evaluator = InMemoryLLMEvaluator::new();
// Track prompt version
let prompt = PromptVersion::new("Summarize: {text}", vec!["text".to_string()]);
evaluator.track_prompt("run-1", &prompt)?;
// Log LLM call metrics
let metrics = LLMMetrics::new("gpt-4")
.with_tokens(100, 50)
.with_latency(1500.0);
evaluator.log_llm_call("run-1", metrics)?;
// Evaluate response quality
let result = evaluator.evaluate_response("What is 2+2?", "4", Some("4"))?;Re-exports§
pub use stats::LLMStats;
Modules§
- heuristics
- Heuristic evaluation functions for LLM responses.
- stats
- Aggregate LLM statistics.
Structs§
- Eval
Result - Evaluation result scores
- InMemoryLLM
Evaluator - In-memory LLM evaluator for testing
- LLMMetrics
- LLM call metrics
- Prompt
Version - Prompt version with content-addressable ID
Enums§
- LLMError
- LLM evaluation errors
Traits§
- LLMEvaluator
- Trait for LLM evaluation systems