# dev-bench — Project Specification (REPS)
> Rust Engineering Project Specification.
> Normative language follows RFC 2119.
## 1. Purpose
`dev-bench` MUST measure code performance and detect regressions
against a stored baseline. Output MUST be a `dev-report::CheckResult`
or `dev-report::Report`, never plain stdout.
## 2. Scope
This crate MUST provide:
- A `Benchmark` runner with sample collection.
- A `BenchmarkResult` with at least mean, p50, and p99 statistics.
- Threshold types for regression detection (percent and absolute).
- A comparison API that returns a `dev-report::CheckResult`.
This crate SHOULD provide (in future versions):
- Throughput measurement (ops/sec).
- Allocation tracking (feature-gated).
- Baseline storage (JSON keyed by git SHA or commit ref).
- Comparison helpers that read baselines from disk.
This crate MUST NOT:
- Run interactive profilers (use `criterion` or `divan`).
- Replace `criterion`. The two coexist for different audiences.
- Produce HTML reports. Output is JSON-via-dev-report only.
## 3. Sample collection
Each `iter` call MUST capture a single duration sample. The runner
MUST NOT re-order or batch iterations transparently. If batching is
needed (e.g. for sub-microsecond ops), it MUST be opt-in via a
distinct API.
## 4. Statistics
- `mean` MUST be the arithmetic mean of all samples.
- `p50` MUST be the median sample.
- `p99` MUST be the 99th percentile sample.
- `cv` MUST be `stddev / mean`, computed across all samples.
- `ops_per_sec` MUST be `iterations_recorded / total_elapsed_seconds`.
- All statistics MUST be computed losslessly (no precomputed bins).
The first three (`mean`, `p50`, `p99`) are the **immutable contract**;
their definitions MUST NOT change in any version. `cv` and
`ops_per_sec` are additive at v0.5.x and v0.2.x respectively.
## 5. Regression detection
A regression is detected when the current measurement exceeds the
baseline by more than the configured threshold.
- `RegressionPct(pct)`: `current_mean > baseline_mean * (1 + pct/100)`
- `RegressionAbsoluteNs(ns)`: `(current_mean - baseline_mean) > ns`
- `ThroughputDropPct(pct)`: `current_ops_per_sec < baseline_ops_per_sec * (1 - pct/100)`,
where `baseline_ops_per_sec = 1.0 / baseline_mean_secs`.
### 5.1 Verdict semantics
- A regression outside the CV noise band MUST emit a `CheckResult`
with verdict `Fail` and severity `Warning`.
- A regression inside the CV noise band (i.e. the duration delta is
no greater than `baseline_mean * cv`) MUST emit verdict `Warn`
rather than `Fail`, when `allow_cv_noise_band` is true.
- A comparison with no baseline MUST emit verdict `Skip`.
- A comparison with fewer samples than `min_samples` MUST emit
verdict `Skip` with a `min_samples` detail.
- Severity escalation to `Error` is the consumer's choice via report
aggregation rules.
### 5.2 Required evidence
Every non-`Skip` `CheckResult` emitted by `compare_*` MUST carry the
following numeric `Evidence` (using `dev-report::Evidence::numeric`):
- `mean_ns`
- `baseline_ns` (when a baseline was provided)
- `p50_ns`
- `p99_ns`
- `cv`
- `ops_per_sec`
- `samples`
- `iterations_recorded`
The `CheckResult` MUST carry the tag `bench`. Regression-flagged
checks MUST additionally carry the tag `regression`.
## 6. Baselines
`0.1.x` accepted baselines as inline `Option<Duration>`. `0.4.x+`
adds a `BaselineStore` trait with a JSON file backend
(`JsonFileBaselineStore`). The trait MUST treat `(scope, name)` as
the identity of a baseline. Implementations MUST:
- Treat `load` as tolerant of missing data (return `Ok(None)`).
- Treat `save` as atomic (write-temp-rename or equivalent).
The default JSON backend keys baselines as
`<root>/<scope>/<name>.json`. `scope` and `name` are sanitized to
prevent path traversal.
The backend MUST NOT be required for basic use; passing
`Option<Duration>` directly into `compare_*` remains supported.
## 7. Allocation tracking
Allocation tracking is opt-in via the `alloc-tracking` feature flag.
When enabled:
- The crate exposes `dev_bench::alloc::AllocationStats`.
- `dhat` is added as a dependency.
- The caller is responsible for installing `dhat::Alloc` as the
global allocator and instantiating `dhat::Profiler` around the
measured scope.
- Allocation stats MUST NOT be combined with timing thresholds in a
single comparison invocation; the dhat allocator changes timing
characteristics enough to invalidate the comparison.
## 8. Producer integration
This crate MUST provide a way to satisfy `dev_report::Producer`. The
provided implementation (`BenchProducer`) wraps a closure that
returns a `BenchmarkResult` and emits a single-check `Report` with
`producer = "dev-bench"`.