benchmark 0.7.1

Nanosecond-precision benchmarking for dev, testing, and production. Zero-overhead core timing when disabled; optional std-powered collectors and zero-dependency metrics (Watch/Timer) for real service observability.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
<h1 align="center">
    <img width="90px" height="auto" src="https://raw.githubusercontent.com/jamesgober/jamesgober/main/media/icons/hexagon-3.svg" alt="Triple Hexagon">
    <br>
    <b>Benchmark</b>
    <br>
    <sub><sup>RUST LIBRARY</sup></sub>
</h1>
<div align="center">
    <a href="https://crates.io/crates/benchmark"><img alt="Crates.io" src="https://img.shields.io/crates/v/benchmark"></a>
    <a href="https://crates.io/crates/benchmark" alt="Download benchmark"><img alt="Crates.io Downloads" src="https://img.shields.io/crates/d/benchmark?color=%230099ff"></a>
    <a href="https://docs.rs/benchmark" title="Benchmark Documentation"><img alt="docs.rs" src="https://img.shields.io/docsrs/benchmark"></a>
    <a href="https://github.com/jamesgober/rust-benchmark/actions"><img alt="GitHub CI" src="https://github.com/jamesgober/rust-benchmark/actions/workflows/ci.yml/badge.svg"></a>
    <a href="https://github.com/jamesgober/rust-benchmark/actions/workflows/bench.yml" title="Benchmarks Workflow"><img alt="Benchmarks" src="https://github.com/jamesgober/rust-benchmark/actions/workflows/bench.yml/badge.svg"></a>
    <a href="https://github.com/rust-lang/rfcs/blob/master/text/2495-min-rust-version.md" title="MSRV"><img alt="MSRV" src="https://img.shields.io/badge/MSRV-1.70%2B-blue"></a>
</div>
<br>
<p>
    Nanosecond-precision benchmarking for <b>development</b>, <b>testing</b>, and <b>production</b>.
    The <b>core timing path</b> is <b>zero-overhead</b> when disabled, while optional, std-powered
    <b>collectors</b> and <b>zero-dependency metrics</b> (<code>Watch</code>/<code>Timer</code> and the
    <code>stopwatch!</code> macro) provide real <b>service observability</b> with percentiles in production.
    Designed to be embedded in performance-critical code without bloat or footguns.
  </p>


<br>

<h2>What is Benchmark?</h2>
<p>
<strong>Benchmark</strong> is a comprehensive benchmarking suite with an advanced performance metrics module that functions in two distinct contexts:

<h3>1: As a Development Tool:</h3>
<p>
  The <strong>Benchmarking Suite</strong> is a <b>statistical benchmarking framework</b> for performance testing and optimization during development. This suite provides the tools for <b>statistical microbenchmarking and comparative analysis</b>, allowing you to <b>detect performance regressions in CI</b> and <b>make data-driven implementation choices during development</b>.
</p>

<h4>Benchmarking Suite Features</h4>
<p>A lightweight, ergonomic alternative to Criterion for statistical performance testing:</p>
<ul>
    <li><a href="./docs/BENCHMARK.md#micro-benchmarking"><b>Micro-Benchmarking</b></a>: Test isolated code blocks with the <code>benchmark_block!</code> macro.</li>
    <li><a href="./docs/BENCHMARK.md#macro-benchmarking"><b>Macro-Benchmarking</b></a>: Test entire functions or modules with the <code>benchmark!</code> macro.</li>
    <li><a href="./docs/BENCHMARK.md#comparative-analysis"><b>Comparative Analysis</b></a>: A/B test multiple implementations to make data-driven optimization decisions.</li>
    <li><a href="./docs/BENCHMARK.md#statistical-sampling"><b>Statistical Sampling</b></a>: Runs code repeatedly to generate robust performance statistics.</li>
    <li><a href="./docs/BENCHMARK.md#ci-cd-integration"><b>CI/CD Integration</b></a>: Detect performance regressions in your deployment pipeline.</li>
    <li><a href="./docs/BENCHMARK.md#load-testing"><b>Load Testing</b></a>: Simulate realistic traffic patterns and stress test your code.</li>
</ul>
<p>
  &mdash; See <a href="./docs/BENCHMARK.md"><b>Benchmark Documentation</b></a> for more information.
</p>

<br>

<h3>2: As a Production Tool:</h3>
<p>
  The <strong>performance metrics</strong> module serves as an advanced, lightweight <b>application performance monitoring (APM)</b> tool that provides seemless production observability. It allows you to instrument your application to capture <b>real-time performance metrics</b> for critical operations like database queries and API response times. Each timing measurement can be thought of as a span, helping you identify bottlenecks in a live system.
</p>
<h4>Performance Metrics Features</h4>
<p>Low-overhead instrumentation for live application monitoring:</p>
<ul>
    <li><a href="./docs/METRICS.md#code-instrumentation"><b>Code Instrumentation</b></a>: Add performance timers to production code with minimal overhead!</li>
    <li><a href="./docs/METRICS.md#distributed-tracing"><b>Distributed Tracing</b></a>: Break down complex operations into spans (database queries, API calls, etc.).</li>
    <li><a href="./docs/METRICS.md#real-time-metrics"><b>Real-time Metrics</b></a>: Capture every operation's performance data, not statistical averages.</li>  
    <li><a href="./docs/METRICS.md#health-check-metrics"><b>Health Check Metrics</b></a>: Monitor TTFB, response times, and system health.</li>
    <li><a href="./docs/METRICS.md#apm-integration"><b>APM Integration</b></a>: Core observability tooling for application performance monitoring.</li>
</ul>
<p>
   &mdash; See <a href="./docs/METRICS.md"><b>Metrics Documentation</b></a> for more information.
</p>

<hr>
<br>

<h2>Features</h2>
<ul>
    <li><b>True Zero-Overhead:</b> When disabled via <code>default-features = false</code>, all benchmarking code compiles away completely, adding zero bytes to your binary and zero nanoseconds to execution time.</li>
    <li><b>No Dependencies:</b> Built using only the <b>Rust standard library</b>, eliminating dependency conflicts and keeping compilation times fast.</li>
    <li><b>Thread-Safe by Design:</b> Core measurement functions are pure and inherently thread-safe, with an optional thread-safe Collector for aggregating measurements across threads.</li>
    <li><b>Async Compatible:</b> Works seamlessly with any async runtime (<code>Tokio</code>, <code>async-std</code>, etc.) without special support or additional features - just time any expression, sync or async.</li>
    <li><b>Nanosecond Precision:</b> Uses platform-specific high-resolution timers through <code>std::time::Instant</code>, providing nanosecond-precision measurements with monotonic guarantees.</li>
    <li><b>Simple Statistics:</b> Provides essential statistics (<code>count</code>, <code>total</code>, <code>min</code>, <code>max</code>, <code>mean</code>) without complex algorithms or memory allocation, keeping the library focused and efficient.</li>
    <li><b>Production Metrics (optional):</b> Enable the <code>metrics</code> feature for a thread-safe <code>Watch</code>, <code>Timer</code> (auto record on drop), and <code>stopwatch!</code> macro using a <b>built-in zero-dependency histogram</b> for percentiles.</li>
    <li><b>Minimal API Surface:</b> Just four functions and two macros - easy to learn, hard to misuse, and unlikely to ever need breaking changes.</li>
    <li><b>Cross-Platform:</b> Consistent behavior across <b>Linux</b>, <b>macOS</b>, <b>Windows</b>, and other platforms supported by Rust's standard library.</li>
</ul>

<hr>
<br>

## Feature Flags
- `none`: no features.
- `std` (*default*): uses Rust standard library; disables `no_std`
- `benchmark` (*default*): enables default benchmark measurement.
- `metrics` (*optional*): production/live metrics (`Watch`, `Timer`, `stopwatch!`).
- `default`: convenience feature equal to `std + benchmark`
- `standard`: convenience feature equal to `std + benchmark + metrics`
- `minimal`: minimal build with core timing only (*no default features*)
- `all`: Activates all features (*includes: `std + benchmark + metrics`*)

&mdash; See [**`FEATURES DOCUMENTATION`**](./docs/features/README.md) for more information.

<br>
<hr>
<br>

<h2>Usage:</h2>

<br>

### Installation
Add this to your `Cargo.toml`:

```toml
[dependencies]
benchmark = "0.7.1"
```

<br>


### Standard Features
> Enables all standard benchmark features.
```toml
[dependencies]

# Enables Production & Development.
benchmark = { version = "0.7.1", features = ["standard"] }
```

<br>

### Production metrics (std + metrics)
Enable production observability using `Watch`/`Timer` or the `stopwatch!` macro.

Cargo features:
```toml
[dependencies]
benchmark = { version = "0.7.1", features = ["std", "metrics"] }
```

Record with `Timer` (auto-record on drop):
```rust
use benchmark::{Watch, Timer};

let watch = Watch::new();
{
    let _t = Timer::new(watch.clone(), "db.query");
    // ... do the work to be measured ...
} // recorded once on drop

let s = &watch.snapshot()["db.query"];
assert!(s.count >= 1);
```

Or use the `stopwatch!` macro:
```rust
use benchmark::{Watch, stopwatch};

let watch = Watch::new();
stopwatch!(watch, "render", {
    // ... work to measure ...
});
assert!(watch.snapshot()["render"].count >= 1);
```

<br>

### Disable Default Features
> True zero-overhead core timing only.
```toml
[dependencies]
# Disable default features for true zero-overhead
benchmark = { version = "0.7.1", default-features = false }
```
<br>

&mdash; See [**`FEATURES DOCUMENTATION`**](./docs/features/README.md) for more information.

<hr>
<br>

## Quick Start


<small>
See also: <a href="./docs/API.md#async-usage"><b>Async Usage</b></a> · <a href="./docs/API.md#disabled-mode-behavior"><b>Disabled Mode Behavior</b></a> · <a href="./docs/API.md#production-metrics-feature-metrics"><b>Production Metrics</b></a>
</small>

### Measure a closure
Use `measure()` to time any closure and get back `(result, Duration)`.
```rust
use benchmark::measure;

let (value, duration) = measure(|| 2 + 2);
assert_eq!(value, 4);
println!("took {} ns", duration.as_nanos());
```

<br>

### Time an expression with the macro
`time!` works with any expression and supports async contexts.
```rust
use benchmark::time;

let (value, duration) = time!({
    let mut sum = 0;
    for i in 0..10_000 { sum += i; }
    sum
});
assert!(duration.as_nanos() > 0);
```

### Micro-benchmark a code block
Use `benchmark_block!` to run a block many times and get raw per-iteration durations.
```rust
use benchmark::benchmark_block;

// Default 10_000 iterations
let samples = benchmark_block!({
    // hot path
    std::hint::black_box(1 + 1);
});
assert_eq!(samples.len(), 10_000);

// Explicit iterations
let samples = benchmark_block!(1_000usize, {
    std::hint::black_box(2 * 3);
});
```

<br>

### Macro-benchmark a named expression
Use `benchmark!` to run a named expression repeatedly and get `(last, Vec<Measurement>)`.
```rust
use benchmark::benchmark;

// Default 10_000 iterations
let (last, ms) = benchmark!("add", { 2 + 3 });
assert_eq!(last, Some(5));
assert_eq!(ms[0].name, "add");

// Explicit iterations
let (_last, ms) = benchmark!("mul", 77usize, { 6 * 7 });
assert_eq!(ms.len(), 77);
```

<small>
Disabled mode (`default-features = false`): `benchmark_block!` runs once and returns `vec![]`; `benchmark!` runs once and returns `(Some(result), vec![])`.
</small>

<br>

### Named timing + Collector aggregation (std + benchmark)
Record a named measurement and aggregate stats with `Collector`.
```rust
use benchmark::{time_named, Collector};

fn work() {
    std::thread::sleep(std::time::Duration::from_millis(1));
}

let collector = Collector::new();
let (_, m) = time_named!("work", work());
collector.record(&m);

let stats = collector.stats("work").unwrap();
println!(
    "count={} total={}ns min={}ns max={}ns mean={}ns",
    stats.count,
    stats.total.as_nanos(),
    stats.min.as_nanos(),
    stats.max.as_nanos(),
    stats.mean.as_nanos()
);
```

<br>

### Async timing with `await`
The macros inline `Instant` timing when `features = ["std", "benchmark"]`, so awaiting inside works seamlessly.
```rust
use benchmark::time;

#[tokio::main(flavor = "current_thread")]
async fn main() {
    let sleep_dur = std::time::Duration::from_millis(10);
    let ((), d) = time!(tokio::time::sleep(sleep_dur).await);
    println!("slept ~{} ms", d.as_millis());
}
```

<hr>

<h2>Benchmarks</h2>
<p>
This repository includes Criterion benchmarks that measure the overhead of the public API compared to a direct <code>Instant::now()</code> baseline.
</p>

<hr>

<h2>Safety &amp; Edge Cases</h2>
<ul>
  <li><b>Zero durations</b>: Some operations can complete so fast that <code>elapsed()</code> may be 0ns on some platforms. The API preserves this and returns <code>Duration::ZERO</code>. If you need a floor for visualization, clamp in your presentation layer.</li>
  <li><b>Saturating conversions</b>: <code>Watch::record_instant()</code> converts <code>as_nanos()</code> to <code>u64</code> using saturating semantics, avoiding panics on extremely large values.</li>
  <li><b>Range clamping</b>: <code>Watch::record()</code> clamps input to histogram bounds, ensuring valid recording and protecting against out-of-range values.</li>
  <li><b>Percentile input clamping</b>: Percentile queries (e.g., <code>Histogram::percentile(q)</code>) clamp <code>q</code> to <b>[0.0, 1.0]</b>. Out-of-range inputs map to min/max (e.g., <code>-0.1 → 0.0</code>, <code>1.2 → 1.0</code>).</li>
  <li><b>Drop safety</b>: <code>Timer</code> records exactly once. It guards against double-record by storing <code>Option&lt;Instant&gt;</code> and recording on <code>Drop</code> even during unwinding.</li>
  <li><b>Empty datasets</b>: <code>Watch::snapshot()</code> and <code>Collector</code> handle empty sets defensively. Snapshots for empty histograms return zeros; <code>Collector::stats()</code> returns <code>None</code> for missing keys.</li>
  <li><b>Overflow protection</b>: <code>Collector</code> uses <code>saturating_add</code> for total accumulation and 128-bit nanosecond storage in <code>Duration</code> to provide ample headroom.</li>
  <li><b>Thread safety</b>: All shared structures use <code>RwLock</code> with short hold-times: clone under read lock, compute outside the lock. Methods will panic only if a lock is poisoned by a prior panic.</li>
  <li><b>Feature gating</b>: Production metrics are gated behind <code>features=["std","metrics"]</code>. Disable default features to make all timing a no-op for zero-overhead builds.</li>
</ul>

<h3>How to run</h3>
<pre><code>cargo bench
</code></pre>

<hr>

<h2>Zero-overhead Proof</h2>
<p>
This project includes automated checks that verify zero bytes and zero time when disabled, and provide assembly and runtime comparison artifacts.
</p>

<ul>
  <li><b>Size comparison (CI)</b>: Workflow <code>.github/workflows/ci.yml</code>, job <b>Zero Overhead</b> builds the crate twice and compares <code>libbenchmark.rlib</code> sizes with and without <code>benchmark</code>. Any unexpected growth fails the job.</li>
  <li><b>Assembly artifacts (CI)</b>: Job <b>Assembly Inspection</b> emits <code>objdump</code>/<code>llvm-objdump</code> symbol tables and disassembly for enabled vs disabled. Download the <b>assembly-artifacts</b> artifact from the run to inspect.</li>
  <li><b>Runtime comparison (CI)</b>: Bench workflow <code>.github/workflows/bench.yml</code> runs the example <code>examples/overhead_compare.rs</code> in both modes and uploads <b>overhead-compare</b> artifacts with raw output.</li>
  <li><b>Compile tests (trybuild)</b>: <code>tests/trybuild_disabled.rs</code> ensures disabled mode compiles with the macro and function APIs used in a minimal program.</li>
  <li><b>Local reproduction</b>:
    <ul>
      <li>Disabled: <code>cargo run --release --example overhead_compare --no-default-features</code></li>
      <li>Benchmark: <code>cargo run --release --example overhead_compare --no-default-features --features "std benchmark"</code></li>
    </ul>
  </li>
</ul>

<p>
When features are disabled (<code>default-features = false</code>), timing returns <code>Duration::ZERO</code> and produces no runtime cost (e.g., <code>time_macro_ns=0</code> on CI-hosted runners), validating the zero-overhead guarantee.
</p>

<h3>Sample results (illustrative)</h3>
<p><i>Results below are from a recent run on GitHub-hosted Linux runners; your numbers will vary by hardware and load.</i></p>
<pre><code>Overhead
--------
instant_now_elapsed           ~ 81–89 ns
measure_closure_add           ~ 79–81 ns
time_macro_add                ~ 81–82 ns

Collector Statistics
--------------------
stats::single/1000            ~ 2.44–2.61 µs
stats::single/10000           ~ 26.7–29.1 µs
stats::all/k10_n1000          ~ 29.4–33.6 µs
stats::all/k50_n1000          ~ 148.6–157.1 µs

Array Baseline (no locks)
-------------------------
stats::array/k1_n10000        ~ 17.15–17.90 µs
stats::array/k10_n1000        ~ 15.03–16.25 µs
</code></pre>

<h4>Interpretation</h4>
<ul>
  <li><b>Macro/function overhead</b> is on par with direct <code>Instant</code> usage for trivial work, as expected.</li>
  <li><b>Collector stats</b> scale roughly linearly with the number of samples; costs are dominated by iteration and min/max/accumulate.</li>
  <li><b>No-lock array baseline</b> provides a lower bound for aggregation cost; the difference vs Collector indicates lock and map overhead.</li>
  <li>Use <code>--features std,benchmark</code> to ensure the benchmark timing path is used.</li>
</ul>

<h3>Benchmarks included</h3>
<ul>
  <li><b>overhead::instant</b>: <code>Instant::now().elapsed()</code> baseline.</li>
  <li><b>overhead::measure</b>: <code>measure(|| expr)</code>.</li>
  <li><b>overhead::time_macro</b>: <code>time!(expr)</code>.</li>
  <li>Bench source: <code>benches/overhead.rs</code>.</li>
  <li>Criterion version: <code>0.5</code>.</li>
  <li>Note: when features are disabled (<code>default-features = false</code>), measurement returns <code>Duration::ZERO</code>.</li>
  <li>Tip: use <code>--features std,benchmark</code> to ensure the benchmark path for benches.</li>
  <li>Environment affects results; run on a quiet system with performance governor if possible.</li>
</ul>

<h3>Sample output (placeholder)</h3>
<pre><code>overhead::instant/instant_now_elapsed
                        time:   [X ns  ..  Y ns  ..  Z ns]

overhead::measure/measure_closure_add
                        time:   [X ns  ..  Y ns  ..  Z ns]

overhead::time_macro/time_macro_add
                        time:   [X ns  ..  Y ns  ..  Z ns]
</code></pre>


<!-- API REFERENCE
############################################# -->
<hr>

<h3>Documentation:</h3>
<ul>
    <li><a href="./docs/API.md"><b>API Reference</b></a> Complete documentation and examples.</li>
    <li><a href="./docs/API.md#production-metrics-feature-metrics"><b>Production Metrics</b></a> <code>Watch</code>, <code>Timer</code>, and <code>stopwatch!</code>.</li>
    <li><a href="./docs/PRINCIPLES.md"><b>Code Principles</b></a> guidelines for contribution &amp; development.</li>
</ul>

<hr>

<h2>MSRV &amp; SemVer Policy</h2>
<ul>
  <li><b>MSRV</b>: 1.70+ (as indicated by the badge). We bump MSRV only in <b>minor</b> releases and document it in the changelog.</li>
  <li><b>SemVer</b>:
    <ul>
      <li>Patch (x.y.<b>z</b>): bug fixes, internal improvements, and documentation.</li>
      <li>Minor (x.<b>y</b>.z): new features and non-breaking API additions.</li>
      <li>Major (<b>x</b>.y.z): breaking changes only, announced in the changelog with migration notes.</li>
    </ul>
  </li>
  <li><b>Feature flags</b>: additive and stable. Disabled paths remain zero-cost.</li>
  <li><b>No unsafe</b> in hot paths; any future "unsafe" will be justified and tested.</li>
  <li><b>Docs.rs</b>: examples note required feature flags to avoid confusion.</li>
  </ul>

<hr>

<h2 align="center">
    DEVELOPMENT &amp; CONTRIBUTION
</h2>

We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

<hr>
<br>

<!-- LICENSE
############################################# -->
<div id="license">
    <h2>⚖️ License</h2>
    <p>Licensed under the <b>Apache License</b>, version 2.0 (the <b>"License"</b>); you may not use this software, including, but not limited to the source code, media files, ideas, techniques, or any other associated property or concept belonging to, associated with, or otherwise packaged with this software except in compliance with the <b>License</b>.</p>
    <p>You may obtain a copy of the <b>License</b> at: <a href="http://www.apache.org/licenses/LICENSE-2.0" title="Apache-2.0 License" target="_blank">http://www.apache.org/licenses/LICENSE-2.0</a>.</p>
    <p>Unless required by applicable law or agreed to in writing, software distributed under the <b>License</b> is distributed on an "<b>AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND</b>, either express or implied.</p>
    <p>See the <a href="./LICENSE" title="Software License file">LICENSE</a> file included with this project for the specific language governing permissions and limitations under the <b>License</b>.</p>
</div>

<br>

<!-- COPYRIGHT
############################################# -->
<div align="center">
  <h2></h2>
  <sup>COPYRIGHT <small>&copy;</small> 2025 <strong>JAMES GOBER.</strong></sup>
</div>