Skip to main content

Crate keleusma_bench

Crate keleusma_bench 

Source
Expand description

Calibration tool for the Keleusma cost model.

Measures pipelined-cycle cost per opcode on a host CPU and emits a generated op_cycles function that the runtime can use for WCET analysis on that host. See the crate-level README for usage and methodology.

Architecture extensibility lives in counter. Opcode coverage lives in the OPCODE_SPECS table here. Source emission lives in emit_cost_model_source.

The crate is no_std + alloc-compatible when the std feature is disabled. Under no_std, the env-variable override for the CPU clock assumption is unavailable; the host runner that calls this crate from std code retains the override. The embedded path consumes only the measurement primitives (OPCODE_SPECS and benchmark_spec) and reports raw measurements through the host’s chosen transport (defmt RTT for Cortex-M).

Modules§

counter
Architecture-specific cycle-counter abstractions.

Structs§

BenchConfig
Runtime configuration for the bench harness. The host CLI uses BenchConfig::host_default; embedded callers construct a smaller-repetition variant via BenchConfig::embedded_default so the constructed chunk fits in device RAM.
Measurement
Result of measuring a single opcode spec. Carries both the floating-point pattern measurement (for diagnostic precision) and the rounded per-op count used in the generated cost model. A minimum reported value of 1 ensures the cost model never reports zero cycles, which would be unsound for use in WCET analysis.
OpcodeSpec
Specification for benchmarking a single opcode. The benchmark engine constructs a Func chunk that inlines the spec’s pattern many times, runs the chunk in tight repetition, and computes the per-pattern cycle cost.

Enums§

ConstValueDescriptor
Helper enum describing constants the spec wants in the constant pool. The benchmark engine converts these to ConstValue at chunk-construction time.

Constants§

MEASUREMENT_PASSES
Number of measurement passes. The minimum across passes is taken as the pipelined-cycle estimate, on the rationale that the minimum corresponds to the run with warmest caches and best branch prediction.
OPCODE_SPECS
Master table of opcode benchmark specs. Each entry tells the benchmark engine how to exercise one opcode. To add coverage for a new opcode, append an entry here with appropriate setup and cleanup ops to keep the operand stack balanced across pattern repetitions.
PATTERN_REPETITIONS
Default number of times the opcode pattern is inlined into the benchmark chunk on the host. Large values are required because architectural cycle counters such as AArch64’s CNTVCT_EL0 run at rates well below the CPU clock (typically 24 MHz to 100 MHz); short runs may not span enough counter ticks to produce useful resolution. Embedded targets whose counter runs at CPU clock (Cortex-M DWT_CYCCNT) can use a much smaller value to keep the constructed chunk inside the device’s RAM budget. See BenchConfig.
WARMUP_PASSES
Number of warmup passes before measurement. Warms instruction and data caches and stabilizes the branch predictor before measurement begins.

Functions§

benchmark_spec
Run the benchmark for a single opcode spec and return the estimated pipelined-cycle cost per pattern repetition.
benchmark_spec_with_config
Same as benchmark_spec but with explicit configuration. Used by embedded callers that need a smaller chunk-size repetition count to fit in device RAM.
emit_cost_model_source
Aggregate measurements by category and emit a Rust source fragment implementing measured_op_cycles. The fragment is written to a file the host includes into its build.
measure_all
Run the full benchmark suite and return per-opcode measurements.
measure_one
Measure a single spec and return a Measurement. Used by the embedded path that emits each measurement through defmt rather than collecting the full table. The host CLI collects through measure_all instead.
measure_one_with_config
Same as measure_one but with explicit configuration. The embedded bench_n6 binary calls this with BenchConfig::embedded_default so the constructed chunk fits in device RAM.