Skip to main content

Module eval

Module eval 

Source
Expand description

Model Evaluation Framework (APR-073)

Comprehensive evaluation module implementing the Model Evaluation Framework Specification. Provides standardized metrics, model comparison, and drift detection with Jidoka principles.

§Architecture

  • classification: Multi-class classification metrics, confusion matrix, reports
  • evaluator: ModelEvaluator for running evaluations and comparisons
  • drift: Statistical drift detection (KS, Chi-sq, PSI)
  • retrain: Auto-retraining with Andon pattern

§Example

use entrenar::eval::{ModelEvaluator, EvalConfig, Metric, Average};

let evaluator = ModelEvaluator::new(EvalConfig {
    metrics: vec![Metric::Accuracy, Metric::F1(Average::Weighted)],
    cv_folds: 5,
    ..Default::default()
});

let result = evaluator.evaluate(&model, &x_test, &y_test)?;
println!("Accuracy: {:.2}%", result.get_score(Metric::Accuracy) * 100.0);

Re-exports§

pub use crate::monitor::drift::AnomalySeverity;
pub use crate::monitor::drift::DriftStatus;
pub use crate::monitor::drift::SlidingWindowBaseline;
pub use drift::DriftCallback;
pub use drift::DriftDetector;
pub use drift::DriftResult;
pub use drift::DriftSummary;
pub use drift::DriftTest;
pub use drift::Severity;
pub use retrain::Action;
pub use retrain::AutoRetrainer;
pub use retrain::RetrainCallback;
pub use retrain::RetrainConfig;
pub use retrain::RetrainPolicy;
pub use retrain::RetrainerStats;
pub use classification::classification_report;
pub use classification::confusion_matrix;
pub use classification::Average;
pub use classification::ConfusionMatrix;
pub use classification::MultiClassMetrics;
pub use evaluator::EvalConfig;
pub use evaluator::EvalResult;
pub use evaluator::KFold;
pub use evaluator::Leaderboard;
pub use evaluator::Metric;
pub use evaluator::ModelEvaluator;
pub use evaluator::RougeVariant;
pub use generative::bleu_score;
pub use generative::ndcg_at_k;
pub use generative::pass_at_k;
pub use generative::perplexity;
pub use generative::real_time_factor_inverse;
pub use generative::rouge_l;
pub use generative::rouge_n;
pub use generative::word_error_rate;

Modules§

classification
Classification metrics for model evaluation
drift
Drift Detection Module
evaluator
Model Evaluator for standardized evaluation and comparison
generative
Generative AI evaluation metrics
retrain
Auto-Retraining Module (APR-073-5)