Skip to main content

Module llm

Module llm 

Source
Expand description

LLM Evaluation Metrics Module (#71)

Direct observation of LLM behavior through comprehensive metrics tracking.

§Toyota Way: 現地現物 (Genchi Genbutsu)

“Go and see” - Direct observation of LLM behavior through metrics enables data-driven decisions about prompt engineering and model selection.

§Example

use entrenar::monitor::llm::{LLMMetrics, PromptVersion, EvalResult, InMemoryLLMEvaluator};

let mut evaluator = InMemoryLLMEvaluator::new();

// Track prompt version
let prompt = PromptVersion::new("Summarize: {text}", vec!["text".to_string()]);
evaluator.track_prompt("run-1", &prompt)?;

// Log LLM call metrics
let metrics = LLMMetrics::new("gpt-4")
    .with_tokens(100, 50)
    .with_latency(1500.0);
evaluator.log_llm_call("run-1", metrics)?;

// Evaluate response quality
let result = evaluator.evaluate_response("What is 2+2?", "4", Some("4"))?;

Re-exports§

pub use stats::LLMStats;

Modules§

heuristics
Heuristic evaluation functions for LLM responses.
stats
Aggregate LLM statistics.

Structs§

EvalResult
Evaluation result scores
InMemoryLLMEvaluator
In-memory LLM evaluator for testing
LLMMetrics
LLM call metrics
PromptVersion
Prompt version with content-addressable ID

Enums§

LLMError
LLM evaluation errors

Traits§

LLMEvaluator
Trait for LLM evaluation systems

Type Aliases§

PromptId
Prompt identifier (content-addressable)
Result
Result type for LLM operations