Module bench

Expand description

Model evaluation and benchmarking framework (spec §7.10) Model Evaluation and Benchmarking Framework (aprender::bench)

Provides multi-model comparison for evaluating .apr models on custom tasks. Unlike QA (single-model validation), this module compares multiple models to find the smallest model that meets a performance threshold.

§Toyota Way Alignment

Pull Systems (P3): Pareto frontier pulls smallest viable model
Muda Elimination: Avoid overprovisioning with right-sized models

§References

Deb et al. (2002) “NSGA-II” for Pareto optimization

§Example

use aprender::bench::{EvalResult, ModelComparison};

let comparison = ModelComparison::new("python-to-rust");
assert!(comparison.results.is_empty());

Modules§

pareto: Pareto Frontier Computation
py2rs: Python to Rust Single-Shot Compile Benchmark (10 Levels)

Structs§

EvalResult: Result of evaluating a single model on a single task
EvalSuiteConfig: Evaluation suite configuration
Example: Example input for evaluation
ExampleResult: Result for a single example
ModelComparison: Compare multiple models on the same task
ParetoPoint: Point on the Pareto frontier
Recommendation: Recommendation for a specific scenario

Enums§

Difficulty: Difficulty tier for stratified analysis
ExampleStatus: Status of an example evaluation

Traits§

EvalTask: Custom evaluation task trait

Module bench

Module bench Copy item path

§Toyota Way Alignment

§References

§Example

Modules§

Structs§

Enums§

Traits§

Module bench