Expand description
Calibration tool for the Keleusma cost model.
Measures pipelined-cycle cost per opcode on a host CPU and emits
a generated op_cycles function that the runtime can use for
WCET analysis on that host. See the crate-level README for usage
and methodology.
Architecture extensibility lives in counter. Opcode coverage
lives in the OPCODE_SPECS table here. Source emission lives in
emit_cost_model_source.
The crate is no_std + alloc-compatible when the std feature is
disabled. Under no_std, the env-variable override for the CPU
clock assumption is unavailable; the host runner that calls this
crate from std code retains the override. The embedded path
consumes only the measurement primitives (OPCODE_SPECS and
benchmark_spec) and reports raw measurements through the
host’s chosen transport (defmt RTT for Cortex-M).
Modules§
- counter
- Architecture-specific cycle-counter abstractions.
Structs§
- Bench
Config - Runtime configuration for the bench harness. The host CLI uses
BenchConfig::host_default; embedded callers construct a smaller-repetition variant viaBenchConfig::embedded_defaultso the constructed chunk fits in device RAM. - Measurement
- Result of measuring a single opcode spec. Carries both the floating-point pattern measurement (for diagnostic precision) and the rounded per-op count used in the generated cost model. A minimum reported value of 1 ensures the cost model never reports zero cycles, which would be unsound for use in WCET analysis.
- Opcode
Spec - Specification for benchmarking a single opcode. The benchmark engine constructs a Func chunk that inlines the spec’s pattern many times, runs the chunk in tight repetition, and computes the per-pattern cycle cost.
Enums§
- Const
Value Descriptor - Helper enum describing constants the spec wants in the constant
pool. The benchmark engine converts these to
ConstValueat chunk-construction time.
Constants§
- MEASUREMENT_
PASSES - Number of measurement passes. The minimum across passes is taken as the pipelined-cycle estimate, on the rationale that the minimum corresponds to the run with warmest caches and best branch prediction.
- OPCODE_
SPECS - Master table of opcode benchmark specs. Each entry tells the benchmark engine how to exercise one opcode. To add coverage for a new opcode, append an entry here with appropriate setup and cleanup ops to keep the operand stack balanced across pattern repetitions.
- PATTERN_
REPETITIONS - Default number of times the opcode pattern is inlined into the
benchmark chunk on the host. Large values are required because
architectural cycle counters such as AArch64’s CNTVCT_EL0 run at
rates well below the CPU clock (typically 24 MHz to 100 MHz);
short runs may not span enough counter ticks to produce useful
resolution. Embedded targets whose counter runs at CPU clock
(Cortex-M DWT_CYCCNT) can use a much smaller value to keep the
constructed chunk inside the device’s RAM budget. See
BenchConfig. - WARMUP_
PASSES - Number of warmup passes before measurement. Warms instruction and data caches and stabilizes the branch predictor before measurement begins.
Functions§
- benchmark_
spec - Run the benchmark for a single opcode spec and return the estimated pipelined-cycle cost per pattern repetition.
- benchmark_
spec_ with_ config - Same as
benchmark_specbut with explicit configuration. Used by embedded callers that need a smaller chunk-size repetition count to fit in device RAM. - emit_
cost_ model_ source - Aggregate measurements by category and emit a Rust source fragment
implementing
measured_op_cycles. The fragment is written to a file the host includes into its build. - measure_
all - Run the full benchmark suite and return per-opcode measurements.
- measure_
one - Measure a single spec and return a
Measurement. Used by the embedded path that emits each measurement through defmt rather than collecting the full table. The host CLI collects throughmeasure_allinstead. - measure_
one_ with_ config - Same as
measure_onebut with explicit configuration. The embeddedbench_n6binary calls this withBenchConfig::embedded_defaultso the constructed chunk fits in device RAM.