Expand description
Model Evaluation Framework (APR-073)
Comprehensive evaluation module implementing the Model Evaluation Framework Specification. Provides standardized metrics, model comparison, and drift detection with Jidoka principles.
§Architecture
classification: Multi-class classification metrics, confusion matrix, reportsevaluator: ModelEvaluator for running evaluations and comparisonsdrift: Statistical drift detection (KS, Chi-sq, PSI)retrain: Auto-retraining with Andon pattern
§Example
ⓘ
use entrenar::eval::{ModelEvaluator, EvalConfig, Metric, Average};
let evaluator = ModelEvaluator::new(EvalConfig {
metrics: vec![Metric::Accuracy, Metric::F1(Average::Weighted)],
cv_folds: 5,
..Default::default()
});
let result = evaluator.evaluate(&model, &x_test, &y_test)?;
println!("Accuracy: {:.2}%", result.get_score(Metric::Accuracy) * 100.0);Re-exports§
pub use crate::monitor::drift::AnomalySeverity;pub use crate::monitor::drift::DriftStatus;pub use crate::monitor::drift::SlidingWindowBaseline;pub use drift::DriftCallback;pub use drift::DriftDetector;pub use drift::DriftResult;pub use drift::DriftSummary;pub use drift::DriftTest;pub use drift::Severity;pub use retrain::Action;pub use retrain::AutoRetrainer;pub use retrain::RetrainCallback;pub use retrain::RetrainConfig;pub use retrain::RetrainPolicy;pub use retrain::RetrainerStats;pub use classification::classification_report;pub use classification::confusion_matrix;pub use classification::Average;pub use classification::ConfusionMatrix;pub use classification::MultiClassMetrics;pub use evaluator::EvalConfig;pub use evaluator::EvalResult;pub use evaluator::KFold;pub use evaluator::Leaderboard;pub use evaluator::Metric;pub use evaluator::ModelEvaluator;pub use evaluator::RougeVariant;pub use generative::bleu_score;pub use generative::ndcg_at_k;pub use generative::pass_at_k;pub use generative::perplexity;pub use generative::real_time_factor_inverse;pub use generative::rouge_l;pub use generative::rouge_n;pub use generative::word_error_rate;
Modules§
- classification
- Classification metrics for model evaluation
- drift
- Drift Detection Module
- evaluator
- Model Evaluator for standardized evaluation and comparison
- generative
- Generative AI evaluation metrics
- retrain
- Auto-Retraining Module (APR-073-5)