quick_perf_event/
lib.rs

1//! This crate provides a lightweight framework for measuring and reporting performance
2//! counters across labeled workloads or benchmarks.
3//! The output format can be controlled using the `QPE_FORMAT` environment variable:
4//!
5//! - **`QPE_FORMAT=live`** (default) - Outputs a **live table**, designed for development and
6//!   debugging. Each result is printed as soon as it is available, with compact,
7//!   fixed-width, line-wrapped cells to fit many columns in narrow terminals.
8//!   If the requested columns still do not fit, table rows are line wrapped as well.
9//! - **`QPE_FORMAT=md`** - Generates a **Markdown table** after all runs have completed,
10//!   choosing column widths automatically for clean, publication-ready output.
11//! - **`QPE_FORMAT=csv`** - Streams results as **CSV** records to stdout, suitable for
12//!   further processing.
13//!
14//! # Example
15//! This benchmark measures computing the sum of an iterator.
16//! ```
17#![doc = include_str!("../examples/short.rs")]
18//! ```
19//! The results show that it takes about 1 CPU cycle to process each number.
20//! Practically no branch or cache misses are encountered.
21//! In total, the benchmark took around 0.2 seconds to run.
22//! ```text
23//!failed to create counter "kcycle": Permission denied (os error 13)
24//!┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
25//!│  label  │  time   │  scale  │  cycle  │ l1-miss │llc-miss │ br-miss │
26//!├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
27//!│   sum   │   0.194 │   1.0 G │   1.007 │ 364.0 n │ 166.0 n │ 345.0 n │
28//!└─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
29//! ```
30//!
31//! # System Configuration
32//! Note that the example above was unable to record the number of CPU cycles spent in the kernel (`kcycle`).
33//! If you run into similar issues, you probably need to configure `perf_event_paranoid`.
34//! You can change this value until the next reboot using `sudo sysctl -w kernel.perf_event_paranoid=0` or permanently by adding `kernel.perf_event_paranoid = 0` to `/etc/sysctl.conf`.
35//! Lower values mean more permissive handling.
36//! See [`man 2 perf_event_open`](https://www.man7.org/linux/man-pages/man2/perf_event_open.2.html) for what the different restriction levels mean.
37//!
38//! # Usage
39//! To start benchmarking, you first need a [`QuickPerfEvent`] object.
40//! [`QuickPerfEvent`] manages both recording and reporting of benchmarks.
41//! You may configure the set of performance counters using either the environment variable `QPE_EVENTS` or [`with_counters`](QuickPerfEvent::with_counters).
42//! For basic usage, you should prefer `QPE_EVENTS`.
43//! For example, to count CPU cycles and branch misses, set it to `cycles,br-miss`.
44//! For an up-to-date list of supported values see the implementation of [`with_counter_names`](PerfCounters::with_counter_names).
45//! If your program is multi-threaded, construct [`QuickPerfEvent`] **before spawning threads** to ensure counts include other threads.
46//!
47//! Now that you have a [`QuickPerfEvent`] object, you may start taking measurements using its [`run`](QuickPerfEvent::run) method.
48//! After each run, you **must** call [`record`](PerfReading::record) on the returned value to log the measurement.
49//! The [`record`](PerfReading::record) method takes two parameters:
50//!
51//! - **`scale`** - a normalization factor (e.g. number of iterations).  
52//!   All performance counters are divided by this value, producing results
53//!   such as *branch misses per operation* or *cycles per iteration*.
54//!   Note that the time column is not normalized.
55//!   It reports the absolute amount of time elapsed over the measurement.
56//!   Dividing this by scale would be misleading when multiple threads are involved.
57//!   If you want a measure of time spent per operation, consider using the task clock counter `t-clock`.
58//!
59//! - **`labels`** - metadata describing the measurement.  
60//!   This can be:
61//!   - the unit type `()` (no labels),
62//!   - a string `&str` (single label),
63//!   - or a user-defined struct implementing [`Labels`].
64//!
65//! # Environment Variables
66//! Quick Perf Event can be configured using various environment variables.
67//! - **`QPE_FORMAT`** - set the output format, see above.
68//! - **`QPE_EVENTS`** - set the counters recorded by a default [PerfCounters] instance.
69//! - **`QPE_LINE_LEN`** - override the line length used for line wrapping live tables. If not set, terminal size is detected automatically.
70//!
71//! # Acknowledgements
72//! This crate is heavily inspired by [the C++ header only library](https://github.com/viktorleis/perfevent).
73
74mod labels;
75mod live_table;
76mod perf_counters;
77mod tabled_float;
78
79pub use labels::Labels;
80pub use live_table::LiveTable;
81pub use perf_counters::{PerfCounters, PerfCountersReading};
82pub use tabled_float::TabledFloat;
83
84use perf_event::CounterData;
85use std::{
86    borrow::Borrow,
87    env,
88    error::Error,
89    io::{Write, stdout},
90    iter,
91    marker::PhantomData,
92    mem,
93    sync::Once,
94    time::SystemTime,
95};
96use tabled::settings::Style;
97
98/// Main entry point for performance measurement.
99///
100/// `QuickPerfEvent` encapsulates a collection of hardware performance counters (`PerfCounters`) and configuration for reporting results.
101/// See the crate level documentation for more information.
102///
103/// The generic parameter `L` must implement [`Labels`], providing a fixed schema
104/// of label names and values for each recorded sample.
105/// [`Labels`] is implemented for `()` and `str`.
106/// You can define a label struct conveniently using the
107/// [`struct_labels!`](crate::struct_labels) macro.
108pub struct QuickPerfEvent<L: ?Sized> {
109    inner: PerfEventInner,
110    _p: PhantomData<L>,
111}
112
113/// See [`QuickPerfEvent::run`] and the crate level docs.
114#[must_use]
115pub struct PerfReading<'a, L: ?Sized, T> {
116    pe: &'a mut QuickPerfEvent<L>,
117    start_time: SystemTime,
118    ret: T,
119}
120
121impl<L: Labels + ?Sized> Default for QuickPerfEvent<L> {
122    fn default() -> Self {
123        Self::new()
124    }
125}
126
127impl<L: Labels + ?Sized> QuickPerfEvent<L> {
128    /// Create a `QuickPerfEvent` configured from environment variables.
129    pub fn new() -> Self {
130        Self::with_counters(PerfCounters::new())
131    }
132
133    /// Create a `QuickPerfEvent` with a custom set of performance counters.
134    pub fn with_counters(counters: PerfCounters) -> Self {
135        QuickPerfEvent {
136            inner: PerfEventInner::new(counters, L::names()),
137            _p: PhantomData,
138        }
139    }
140
141    /// Perform a measurement.
142    ///
143    /// You **must** call [`record`](PerfReading::record) on the returned value.
144    pub fn run<R>(&mut self, f: impl FnOnce() -> R) -> PerfReading<'_, L, R> {
145        let start_time = SystemTime::now();
146        self.inner.counters.reset();
147        self.inner.counters.enable();
148        let ret = f();
149        self.inner.counters.disable();
150        PerfReading {
151            start_time,
152            ret,
153            pe: self,
154        }
155    }
156}
157
158impl<L: Labels + ?Sized, T> PerfReading<'_, L, T> {
159    /// Records the measured result.
160    ///
161    /// The `scale` argument normalizes counter values (e.g. per iteration count).
162    /// The given `labels` instance supplies the labels for this sample.
163    pub fn record(self, scale: usize, labels: impl Borrow<L>) -> T {
164        PerfEventInner::report_error(self.pe.inner.push(scale, self.start_time, &mut |dst| {
165            labels.borrow().values(dst)
166        }));
167        self.ret
168    }
169}
170
171struct PerfEventInner {
172    counters: PerfCounters,
173    state: OutputState,
174    label_names: &'static [&'static str],
175}
176
177impl Drop for PerfEventInner {
178    fn drop(&mut self) {
179        self.dump_and_reset();
180    }
181}
182
183enum OutputState {
184    Tabled {
185        readings: Vec<PerfReadingExtra>,
186        markdown: bool,
187    },
188    Live {
189        table: LiveTable,
190    },
191    Csv {
192        header_written: bool,
193        writer: csv::Writer<Box<dyn Write>>,
194    },
195}
196
197impl PerfEventInner {
198    fn new(counters: PerfCounters, label_names: &'static [&'static str]) -> Self {
199        PerfEventInner {
200            state: match std::env::var("QPE_FORMAT").as_deref() {
201                Ok("csv") => OutputState::Csv {
202                    header_written: false,
203                    writer: csv::Writer::from_writer(Box::new(stdout())),
204                },
205                Ok("md") => OutputState::Tabled {
206                    readings: Vec::new(),
207                    markdown: true,
208                },
209                x => {
210                    match x {
211                        Ok(requested) => {
212                            eprintln!(
213                                "unrecognized value for QPE_FORMAT: {requested:?}.\nSupported values: csv, md"
214                            );
215                        }
216                        Err(_) => {}
217                    }
218                    OutputState::Live {
219                        table: LiveTable::new(
220                            label_names.len() + 2 + counters.names().count(),
221                            9,
222                            env::var("QPE_LINE_LEN")
223                                .ok()
224                                .and_then(|x| {
225                                    x.parse()
226                                        .map_err(|_| {
227                                            eprintln!("failed to parse line len: {x:?}");
228                                        })
229                                        .ok()
230                                })
231                                .or_else(|| terminal_size::terminal_size().map(|x| x.0.0 as usize))
232                                .unwrap_or(160),
233                        ),
234                    }
235                }
236            },
237            counters,
238            label_names,
239        }
240    }
241    fn push(
242        &mut self,
243        scale: usize,
244        _start_time: SystemTime,
245        labels: &mut dyn FnMut(&mut dyn FnMut(&str)),
246    ) -> Result<(), Box<dyn Error>> {
247        match &mut self.state {
248            OutputState::Tabled {
249                readings,
250                markdown: _,
251            } => {
252                let mut label_vec = Vec::new();
253                labels(&mut |l: &str| label_vec.push(l.to_string()));
254                readings.push(PerfReadingExtra {
255                    scale,
256                    labels: label_vec,
257                    counters: self.counters.read_counters(),
258                });
259            }
260            OutputState::Live { table } => {
261                if !table.table_started() {
262                    for name in self.label_names {
263                        table.push(name.to_string())?;
264                    }
265                    for val in Self::value_names(&self.counters) {
266                        table.push(val.to_string())?;
267                    }
268                }
269                let mut err = Ok(());
270                labels(&mut |label| {
271                    if err.is_ok() {
272                        err = table.push(label.to_string());
273                    }
274                });
275                err?;
276                let mut discard_multiplexed = false;
277                for val in Self::values(
278                    &self.counters.read_counters(),
279                    scale,
280                    &mut discard_multiplexed,
281                ) {
282                    table.push(TabledFloat(val).to_string())?;
283                }
284            }
285            OutputState::Csv {
286                header_written,
287                writer,
288            } => {
289                macro_rules! write_field {
290                    ($x:expr) => {
291                        writer.write_field($x)?;
292                    };
293                }
294                if !*header_written {
295                    *header_written = true;
296                    for l in self.label_names {
297                        write_field!(l);
298                    }
299                    for name in Self::value_names(&self.counters) {
300                        write_field!(name);
301                    }
302                    write_field!("multiplexed");
303                    writer.write_record(iter::empty::<&[u8]>())?;
304                }
305                let mut err = Ok(());
306                labels(&mut |l| {
307                    if err.is_ok() {
308                        err = writer.write_field(l);
309                    }
310                });
311                err?;
312                let reading = self.counters.read_counters();
313                let mut any_multiplexed = false;
314                for value in Self::values(&reading, scale, &mut any_multiplexed) {
315                    write_field!(&format!("{value}"));
316                }
317                write_field!(&format!("{any_multiplexed}"));
318                writer.write_record(iter::empty::<&[u8]>())?;
319                writer.flush()?;
320            }
321        }
322        Ok(())
323    }
324
325    fn process_counter(counter: &CounterData, scale: usize) -> (f64, bool) {
326        let multiplexed = counter.time_enabled() != counter.time_running();
327        let scaled = counter.count() as f64 * counter.time_running().unwrap().as_secs_f64()
328            / counter.time_enabled().unwrap().as_secs_f64()
329            / scale as f64;
330        (scaled, multiplexed)
331    }
332
333    fn report_error(err: Result<(), Box<dyn Error>>) {
334        static ONCE: Once = Once::new();
335        if let Err(err) = err {
336            ONCE.call_once(|| {
337                eprintln!("QPE failed to write. Future errors will not be reported. {err}");
338            });
339        }
340    }
341
342    fn dump_and_reset(&mut self) {
343        match &mut self.state {
344            OutputState::Tabled { readings, markdown } => {
345                let mut table = tabled::builder::Builder::new();
346                table.push_record(self.label_names.iter().copied());
347                for reading in &mut *readings {
348                    table.push_record(mem::take(&mut reading.labels));
349                }
350                let multiplexed = readings
351                    .iter()
352                    .flat_map(|x| x.counters.counters.iter())
353                    .any(|x| x.time_enabled() != x.time_running());
354                for (i, name) in self.counters.names().enumerate() {
355                    let readings = || {
356                        readings.iter().map(|x| {
357                            let c = &x.counters.counters[i];
358                            c.count() as f64 * c.time_enabled().unwrap().as_secs_f64()
359                                / c.time_running().unwrap().as_secs_f64()
360                                / x.scale as f64
361                        })
362                    };
363                    // let max = readings().max_by(f64::total_cmp).unwrap();
364                    // let min = readings().min_by(f64::total_cmp).unwrap();
365                    // let max_scale = max.log10().floor() as isize;
366                    // let min_scale = min.log10().floor() as isize;
367                    table.push_column(
368                        iter::once(name.to_string()).chain(readings().map(|x| format!("{x:3.3}"))),
369                    );
370                }
371                let multiplex_warning = if multiplexed {
372                    "⚠️ Some counters were multiplexed.\n"
373                } else {
374                    "\n"
375                };
376                let mut table = table.build();
377                if *markdown {
378                    table.with(Style::markdown());
379                }
380                println!("{multiplex_warning}{table}");
381            }
382            OutputState::Live { table } => {
383                Self::report_error(table.end_table().map_err(Into::into));
384            }
385            OutputState::Csv {
386                header_written,
387                writer: _,
388            } => {
389                *header_written = false;
390            }
391        }
392    }
393
394    fn value_names(counters: &PerfCounters) -> impl Iterator<Item = &str> {
395        ["time", "scale"].into_iter().chain(counters.names())
396    }
397
398    fn values(
399        counters: &PerfCountersReading,
400        scale: usize,
401        any_multiplexed: &mut bool,
402    ) -> impl Iterator<Item = f64> {
403        let time = counters.duration.as_secs_f64();
404        let counters = counters.counters.iter();
405        let counters = counters.map(move |x| {
406            let x = Self::process_counter(x, scale);
407            *any_multiplexed |= x.1;
408            x.0
409        });
410        [time, scale as f64].into_iter().chain(counters)
411    }
412}
413
414struct PerfReadingExtra {
415    scale: usize,
416    labels: Vec<String>,
417    counters: PerfCountersReading,
418}