Module llm

Expand description

LLM Evaluation Metrics Module (#71)

Direct observation of LLM behavior through comprehensive metrics tracking.

§Toyota Way: 現地現物 (Genchi Genbutsu)

“Go and see” - Direct observation of LLM behavior through metrics enables data-driven decisions about prompt engineering and model selection.

§Example

use entrenar::monitor::llm::{LLMMetrics, PromptVersion, EvalResult, InMemoryLLMEvaluator};

let mut evaluator = InMemoryLLMEvaluator::new();

// Track prompt version
let prompt = PromptVersion::new("Summarize: {text}", vec!["text".to_string()]);
evaluator.track_prompt("run-1", &prompt)?;

// Log LLM call metrics
let metrics = LLMMetrics::new("gpt-4")
    .with_tokens(100, 50)
    .with_latency(1500.0);
evaluator.log_llm_call("run-1", metrics)?;

// Evaluate response quality
let result = evaluator.evaluate_response("What is 2+2?", "4", Some("4"))?;

Re-exports§

pub use stats::LLMStats;

Modules§

heuristics: Heuristic evaluation functions for LLM responses.
stats: Aggregate LLM statistics.

Structs§

EvalResult: Evaluation result scores
InMemoryLLMEvaluator: In-memory LLM evaluator for testing
LLMMetrics: LLM call metrics
PromptVersion: Prompt version with content-addressable ID

Enums§

LLMError: LLM evaluation errors

Traits§

LLMEvaluator: Trait for LLM evaluation systems

Type Aliases§

PromptId: Prompt identifier (content-addressable)
Result: Result type for LLM operations

Module llm

Module llm Copy item path

§Toyota Way: 現地現物 (Genchi Genbutsu)

§Example

Re-exports§

Modules§

Structs§

Enums§

Traits§

Type Aliases§

Module llm