quick_perf_event/lib.rs
1//! This crate provides a lightweight framework for measuring and reporting performance
2//! counters across labeled workloads or benchmarks.
3//! The output format can be controlled using the `QPE_FORMAT` environment variable:
4//!
5//! - **`QPE_FORMAT=live`** (default) - Outputs a **live table**, designed for development and
6//! debugging. Each result is printed as soon as it is available, with compact,
7//! fixed-width, line-wrapped cells to fit many columns in narrow terminals.
8//! If the requested columns still do not fit, table rows are line wrapped as well.
9//! - **`QPE_FORMAT=md`** - Generates a **Markdown table** after all runs have completed,
10//! choosing column widths automatically for clean, publication-ready output.
11//! - **`QPE_FORMAT=csv`** - Streams results as **CSV** records to stdout, suitable for
12//! further processing.
13//!
14//! # Example
15//! This benchmark measures computing the sum of an iterator.
16//! ```
17#![doc = include_str!("../examples/short.rs")]
18//! ```
19//! The results show that it takes about 1 CPU cycle to process each number.
20//! Practically no branch or cache misses are encountered.
21//! In total, the benchmark took around 0.2 seconds to run.
22//! ```text
23//!failed to create counter "kcycle": Permission denied (os error 13)
24//!┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
25//!│ label │ time │ scale │ cycle │ l1-miss │llc-miss │ br-miss │
26//!├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
27//!│ sum │ 0.194 │ 1.0 G │ 1.007 │ 364.0 n │ 166.0 n │ 345.0 n │
28//!└─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
29//! ```
30//!
31//! # System Configuration
32//! Note that the example above was unable to record the number of CPU cycles spent in the kernel (`kcycle`).
33//! If you run into similar issues, you probably need to configure `perf_event_paranoid`.
34//! You can change this value until the next reboot using `sudo sysctl -w kernel.perf_event_paranoid=0` or permanently by adding `kernel.perf_event_paranoid = 0` to `/etc/sysctl.conf`.
35//! Lower values mean more permissive handling.
36//! See [`man 2 perf_event_open`](https://www.man7.org/linux/man-pages/man2/perf_event_open.2.html) for what the different restriction levels mean.
37//!
38//! # Usage
39//! To start benchmarking, you first need a [`QuickPerfEvent`] object.
40//! [`QuickPerfEvent`] manages both recording and reporting of benchmarks.
41//! You may configure the set of performance counters using either the environment variable `QPE_EVENTS` or [`with_counters`](QuickPerfEvent::with_counters).
42//! For basic usage, you should prefer `QPE_EVENTS`.
43//! For example, to count CPU cycles and branch misses, set it to `cycles,br-miss`.
44//! For an up-to-date list of supported values see the implementation of [`with_counter_names`](PerfCounters::with_counter_names).
45//! If your program is multi-threaded, construct [`QuickPerfEvent`] **before spawning threads** to ensure counts include other threads.
46//!
47//! Now that you have a [`QuickPerfEvent`] object, you may start taking measurements using its [`run`](QuickPerfEvent::run) method.
48//! After each run, you **must** call [`record`](PerfReading::record) on the returned value to log the measurement.
49//! The [`record`](PerfReading::record) method takes two parameters:
50//!
51//! - **`scale`** - a normalization factor (e.g. number of iterations).
52//! All performance counters are divided by this value, producing results
53//! such as *branch misses per operation* or *cycles per iteration*.
54//! Note that the time column is not normalized.
55//! It reports the absolute amount of time elapsed over the measurement.
56//! Dividing this by scale would be misleading when multiple threads are involved.
57//! If you want a measure of time spent per operation, consider using the task clock counter `t-clock`.
58//!
59//! - **`labels`** - metadata describing the measurement.
60//! This can be:
61//! - the unit type `()` (no labels),
62//! - a string `&str` (single label),
63//! - or a user-defined struct implementing [`Labels`].
64//!
65//! # Environment Variables
66//! Quick Perf Event can be configured using various environment variables.
67//! - **`QPE_FORMAT`** - set the output format, see above.
68//! - **`QPE_EVENTS`** - set the counters recorded by a default [PerfCounters] instance.
69//! - **`QPE_LINE_LEN`** - override the line length used for line wrapping live tables. If not set, terminal size is detected automatically.
70//!
71//! # Acknowledgements
72//! This crate is heavily inspired by [the C++ header only library](https://github.com/viktorleis/perfevent).
73
74mod labels;
75mod live_table;
76mod perf_counters;
77mod tabled_float;
78
79pub use labels::Labels;
80pub use live_table::LiveTable;
81pub use perf_counters::{PerfCounters, PerfCountersReading};
82pub use tabled_float::TabledFloat;
83
84use perf_event::CounterData;
85use std::{
86 borrow::Borrow,
87 env,
88 error::Error,
89 io::{Write, stdout},
90 iter,
91 marker::PhantomData,
92 mem,
93 sync::Once,
94 time::SystemTime,
95};
96use tabled::settings::Style;
97
98/// Main entry point for performance measurement.
99///
100/// `QuickPerfEvent` encapsulates a collection of hardware performance counters (`PerfCounters`) and configuration for reporting results.
101/// See the crate level documentation for more information.
102///
103/// The generic parameter `L` must implement [`Labels`], providing a fixed schema
104/// of label names and values for each recorded sample.
105/// [`Labels`] is implemented for `()` and `str`.
106/// You can define a label struct conveniently using the
107/// [`struct_labels!`](crate::struct_labels) macro.
108pub struct QuickPerfEvent<L: ?Sized> {
109 inner: PerfEventInner,
110 _p: PhantomData<L>,
111}
112
113/// See [`QuickPerfEvent::run`] and the crate level docs.
114#[must_use]
115pub struct PerfReading<'a, L: ?Sized, T> {
116 pe: &'a mut QuickPerfEvent<L>,
117 start_time: SystemTime,
118 ret: T,
119}
120
121impl<L: Labels + ?Sized> Default for QuickPerfEvent<L> {
122 fn default() -> Self {
123 Self::new()
124 }
125}
126
127impl<L: Labels + ?Sized> QuickPerfEvent<L> {
128 /// Create a `QuickPerfEvent` configured from environment variables.
129 pub fn new() -> Self {
130 Self::with_counters(PerfCounters::new())
131 }
132
133 /// Create a `QuickPerfEvent` with a custom set of performance counters.
134 pub fn with_counters(counters: PerfCounters) -> Self {
135 QuickPerfEvent {
136 inner: PerfEventInner::new(counters, L::names()),
137 _p: PhantomData,
138 }
139 }
140
141 /// Perform a measurement.
142 ///
143 /// You **must** call [`record`](PerfReading::record) on the returned value.
144 pub fn run<R>(&mut self, f: impl FnOnce() -> R) -> PerfReading<'_, L, R> {
145 let start_time = SystemTime::now();
146 self.inner.counters.reset();
147 self.inner.counters.enable();
148 let ret = f();
149 self.inner.counters.disable();
150 PerfReading {
151 start_time,
152 ret,
153 pe: self,
154 }
155 }
156}
157
158impl<L: Labels + ?Sized, T> PerfReading<'_, L, T> {
159 /// Records the measured result.
160 ///
161 /// The `scale` argument normalizes counter values (e.g. per iteration count).
162 /// The given `labels` instance supplies the labels for this sample.
163 pub fn record(self, scale: usize, labels: impl Borrow<L>) -> T {
164 PerfEventInner::report_error(self.pe.inner.push(scale, self.start_time, &mut |dst| {
165 labels.borrow().values(dst)
166 }));
167 self.ret
168 }
169}
170
171struct PerfEventInner {
172 counters: PerfCounters,
173 state: OutputState,
174 label_names: &'static [&'static str],
175}
176
177impl Drop for PerfEventInner {
178 fn drop(&mut self) {
179 self.dump_and_reset();
180 }
181}
182
183enum OutputState {
184 Tabled {
185 readings: Vec<PerfReadingExtra>,
186 markdown: bool,
187 },
188 Live {
189 table: LiveTable,
190 },
191 Csv {
192 header_written: bool,
193 writer: csv::Writer<Box<dyn Write>>,
194 },
195}
196
197impl PerfEventInner {
198 fn new(counters: PerfCounters, label_names: &'static [&'static str]) -> Self {
199 PerfEventInner {
200 state: match std::env::var("QPE_FORMAT").as_deref() {
201 Ok("csv") => OutputState::Csv {
202 header_written: false,
203 writer: csv::Writer::from_writer(Box::new(stdout())),
204 },
205 Ok("md") => OutputState::Tabled {
206 readings: Vec::new(),
207 markdown: true,
208 },
209 x => {
210 match x {
211 Ok(requested) => {
212 eprintln!(
213 "unrecognized value for QPE_FORMAT: {requested:?}.\nSupported values: csv, md"
214 );
215 }
216 Err(_) => {}
217 }
218 OutputState::Live {
219 table: LiveTable::new(
220 label_names.len() + 2 + counters.names().count(),
221 9,
222 env::var("QPE_LINE_LEN")
223 .ok()
224 .and_then(|x| {
225 x.parse()
226 .map_err(|_| {
227 eprintln!("failed to parse line len: {x:?}");
228 })
229 .ok()
230 })
231 .or_else(|| terminal_size::terminal_size().map(|x| x.0.0 as usize))
232 .unwrap_or(160),
233 ),
234 }
235 }
236 },
237 counters,
238 label_names,
239 }
240 }
241 fn push(
242 &mut self,
243 scale: usize,
244 _start_time: SystemTime,
245 labels: &mut dyn FnMut(&mut dyn FnMut(&str)),
246 ) -> Result<(), Box<dyn Error>> {
247 match &mut self.state {
248 OutputState::Tabled {
249 readings,
250 markdown: _,
251 } => {
252 let mut label_vec = Vec::new();
253 labels(&mut |l: &str| label_vec.push(l.to_string()));
254 readings.push(PerfReadingExtra {
255 scale,
256 labels: label_vec,
257 counters: self.counters.read_counters(),
258 });
259 }
260 OutputState::Live { table } => {
261 if !table.table_started() {
262 for name in self.label_names {
263 table.push(name.to_string())?;
264 }
265 for val in Self::value_names(&self.counters) {
266 table.push(val.to_string())?;
267 }
268 }
269 let mut err = Ok(());
270 labels(&mut |label| {
271 if err.is_ok() {
272 err = table.push(label.to_string());
273 }
274 });
275 err?;
276 let mut discard_multiplexed = false;
277 for val in Self::values(
278 &self.counters.read_counters(),
279 scale,
280 &mut discard_multiplexed,
281 ) {
282 table.push(TabledFloat(val).to_string())?;
283 }
284 }
285 OutputState::Csv {
286 header_written,
287 writer,
288 } => {
289 macro_rules! write_field {
290 ($x:expr) => {
291 writer.write_field($x)?;
292 };
293 }
294 if !*header_written {
295 *header_written = true;
296 for l in self.label_names {
297 write_field!(l);
298 }
299 for name in Self::value_names(&self.counters) {
300 write_field!(name);
301 }
302 write_field!("multiplexed");
303 writer.write_record(iter::empty::<&[u8]>())?;
304 }
305 let mut err = Ok(());
306 labels(&mut |l| {
307 if err.is_ok() {
308 err = writer.write_field(l);
309 }
310 });
311 err?;
312 let reading = self.counters.read_counters();
313 let mut any_multiplexed = false;
314 for value in Self::values(&reading, scale, &mut any_multiplexed) {
315 write_field!(&format!("{value}"));
316 }
317 write_field!(&format!("{any_multiplexed}"));
318 writer.write_record(iter::empty::<&[u8]>())?;
319 writer.flush()?;
320 }
321 }
322 Ok(())
323 }
324
325 fn process_counter(counter: &CounterData, scale: usize) -> (f64, bool) {
326 let multiplexed = counter.time_enabled() != counter.time_running();
327 let scaled = counter.count() as f64 * counter.time_running().unwrap().as_secs_f64()
328 / counter.time_enabled().unwrap().as_secs_f64()
329 / scale as f64;
330 (scaled, multiplexed)
331 }
332
333 fn report_error(err: Result<(), Box<dyn Error>>) {
334 static ONCE: Once = Once::new();
335 if let Err(err) = err {
336 ONCE.call_once(|| {
337 eprintln!("QPE failed to write. Future errors will not be reported. {err}");
338 });
339 }
340 }
341
342 fn dump_and_reset(&mut self) {
343 match &mut self.state {
344 OutputState::Tabled { readings, markdown } => {
345 let mut table = tabled::builder::Builder::new();
346 table.push_record(self.label_names.iter().copied());
347 for reading in &mut *readings {
348 table.push_record(mem::take(&mut reading.labels));
349 }
350 let multiplexed = readings
351 .iter()
352 .flat_map(|x| x.counters.counters.iter())
353 .any(|x| x.time_enabled() != x.time_running());
354 for (i, name) in self.counters.names().enumerate() {
355 let readings = || {
356 readings.iter().map(|x| {
357 let c = &x.counters.counters[i];
358 c.count() as f64 * c.time_enabled().unwrap().as_secs_f64()
359 / c.time_running().unwrap().as_secs_f64()
360 / x.scale as f64
361 })
362 };
363 // let max = readings().max_by(f64::total_cmp).unwrap();
364 // let min = readings().min_by(f64::total_cmp).unwrap();
365 // let max_scale = max.log10().floor() as isize;
366 // let min_scale = min.log10().floor() as isize;
367 table.push_column(
368 iter::once(name.to_string()).chain(readings().map(|x| format!("{x:3.3}"))),
369 );
370 }
371 let multiplex_warning = if multiplexed {
372 "⚠️ Some counters were multiplexed.\n"
373 } else {
374 "\n"
375 };
376 let mut table = table.build();
377 if *markdown {
378 table.with(Style::markdown());
379 }
380 println!("{multiplex_warning}{table}");
381 }
382 OutputState::Live { table } => {
383 Self::report_error(table.end_table().map_err(Into::into));
384 }
385 OutputState::Csv {
386 header_written,
387 writer: _,
388 } => {
389 *header_written = false;
390 }
391 }
392 }
393
394 fn value_names(counters: &PerfCounters) -> impl Iterator<Item = &str> {
395 ["time", "scale"].into_iter().chain(counters.names())
396 }
397
398 fn values(
399 counters: &PerfCountersReading,
400 scale: usize,
401 any_multiplexed: &mut bool,
402 ) -> impl Iterator<Item = f64> {
403 let time = counters.duration.as_secs_f64();
404 let counters = counters.counters.iter();
405 let counters = counters.map(move |x| {
406 let x = Self::process_counter(x, scale);
407 *any_multiplexed |= x.1;
408 x.0
409 });
410 [time, scale as f64].into_iter().chain(counters)
411 }
412}
413
414struct PerfReadingExtra {
415 scale: usize,
416 labels: Vec<String>,
417 counters: PerfCountersReading,
418}