Module bench

Module bench 

Source
Expand description

Model evaluation and benchmarking framework (spec §7.10) Model Evaluation and Benchmarking Framework (aprender::bench)

Provides multi-model comparison for evaluating .apr models on custom tasks. Unlike QA (single-model validation), this module compares multiple models to find the smallest model that meets a performance threshold.

§Toyota Way Alignment

  • Pull Systems (P3): Pareto frontier pulls smallest viable model
  • Muda Elimination: Avoid overprovisioning with right-sized models

§References

  • Deb et al. (2002) “NSGA-II” for Pareto optimization

§Example

use aprender::bench::{EvalResult, ModelComparison};

let comparison = ModelComparison::new("python-to-rust");
assert!(comparison.results.is_empty());

Modules§

pareto
Pareto Frontier Computation
py2rs
Python to Rust Single-Shot Compile Benchmark (10 Levels)

Structs§

EvalResult
Result of evaluating a single model on a single task
EvalSuiteConfig
Evaluation suite configuration
Example
Example input for evaluation
ExampleResult
Result for a single example
ModelComparison
Compare multiple models on the same task
ParetoPoint
Point on the Pareto frontier
Recommendation
Recommendation for a specific scenario

Enums§

Difficulty
Difficulty tier for stratified analysis
ExampleStatus
Status of an example evaluation

Traits§

EvalTask
Custom evaluation task trait