pub struct ThroughputBenchmark {
pub warmup_runs: usize,
pub measurement_runs: usize,
pub prompt: String,
pub max_tokens: usize,
}Expand description
Builder for throughput benchmark runs.
Collects timing data from caller-supplied closures rather than running the model directly, keeping this crate decoupled from the inference engine.
Fields§
§warmup_runs: usizeNumber of warm-up runs (results discarded).
measurement_runs: usizeNumber of measurement runs (results aggregated).
prompt: StringThe prompt used for benchmarking.
max_tokens: usizeMaximum tokens to generate per run.
Implementations§
Source§impl ThroughputBenchmark
impl ThroughputBenchmark
Sourcepub fn new(prompt: &str, max_tokens: usize) -> Self
pub fn new(prompt: &str, max_tokens: usize) -> Self
Create a benchmark with 3 warm-up runs and 10 measurement runs.
Sourcepub fn with_warmup(self, warmup: usize) -> Self
pub fn with_warmup(self, warmup: usize) -> Self
Override the number of warm-up runs.
Sourcepub fn from_timings(
&self,
run_timings: &[(f32, f32, usize)],
) -> ThroughputResult
pub fn from_timings( &self, run_timings: &[(f32, f32, usize)], ) -> ThroughputResult
Run the benchmark using caller-supplied timing data.
run_timings is a slice of (prefill_ms, decode_ms, tokens_generated) tuples,
one per measurement run (warm-up timings should already be excluded by the caller).
This method computes aggregate statistics from the provided data without calling the inference engine itself, allowing flexible integration.