pprof 0.3.8

An internal perf tools for rust programs.
Documentation
# pprof

`pprof` is a cpu profiler that can be easily integrated into a rust program.

[![Crates.io](https://img.shields.io/crates/v/pprof.svg)](https://crates.io/crates/pprof)

## Usage

First, get a guard to start profiling. Profiling will continue until this guard was dropped.

```rust
let guard = pprof::ProfilerGuard::new(100).unwrap();
```

During the profiling time, you can get a report with the guard.

```rust
if let Ok(report) = guard.report().build() {
    println!("report: {}", &report);
};
```

`Display` was implemented for `Report`. It will print a human-readable stack counter report. Here is an example:

```
FRAME: backtrace::backtrace::trace::h3e91a3123a3049a5 -> FRAME: pprof::profiler::perf_signal_handler::h7b995c4ab2e66493 -> FRAME: Unknown -> FRAME: prime_number::is_prime_number::h70653a2633b88023 -> FRAME: prime_number::main::h47f1058543990c8b -> FRAME: std::rt::lang_start::{{closure}}::h4262e250f8024b06 -> FRAME: std::rt::lang_start_internal::{{closure}}::h812f70926ebbddd0 -> std::panicking::try::do_call::h3210e2ce6a68897b -> FRAME: __rust_maybe_catch_panic -> FRAME: std::panicking::try::h28c2e2ec1c3871ce -> std::panic::catch_unwind::h05e542185e35aabf -> std::rt::lang_start_internal::hd7efcfd33686f472 -> FRAME: main -> FRAME: __libc_start_main -> FRAME: _start -> FRAME: Unknown -> THREAD: prime_number 1217
FRAME: backtrace::backtrace::trace::h3e91a3123a3049a5 -> FRAME: pprof::profiler::perf_signal_handler::h7b995c4ab2e66493 -> FRAME: Unknown -> FRAME: alloc::alloc::box_free::h82cea48ed688e081 -> FRAME: prime_number::main::h47f1058543990c8b -> FRAME: std::rt::lang_start::{{closure}}::h4262e250f8024b06 -> FRAME: std::rt::lang_start_internal::{{closure}}::h812f70926ebbddd0 -> std::panicking::try::do_call::h3210e2ce6a68897b -> FRAME: __rust_maybe_catch_panic -> FRAME: std::panicking::try::h28c2e2ec1c3871ce -> std::panic::catch_unwind::h05e542185e35aabf -> std::rt::lang_start_internal::hd7efcfd33686f472 -> FRAME: main -> FRAME: __libc_start_main -> FRAME: _start -> FRAME: Unknown -> THREAD: prime_number 1
FRAME: backtrace::backtrace::trace::h3e91a3123a3049a5 -> FRAME: pprof::profiler::perf_signal_handler::h7b995c4ab2e66493 -> FRAME: Unknown -> FRAME: prime_number::main::h47f1058543990c8b -> FRAME: std::rt::lang_start::{{closure}}::h4262e250f8024b06 -> FRAME: std::rt::lang_start_internal::{{closure}}::h812f70926ebbddd0 -> std::panicking::try::do_call::h3210e2ce6a68897b -> FRAME: __rust_maybe_catch_panic -> FRAME: std::panicking::try::h28c2e2ec1c3871ce -> std::panic::catch_unwind::h05e542185e35aabf -> std::rt::lang_start_internal::hd7efcfd33686f472 -> FRAME: main -> FRAME: __libc_start_main -> FRAME: _start -> FRAME: Unknown -> THREAD: prime_number 1
```

## Flamegraph

```toml
pprof = { version = "0.3", features = ["flamegraph"] } 
```

If `flamegraph` feature is enabled, you can generate flamegraph from the report. `Report` struct has a method `flamegraph` which can generate flamegraph and write it into a `Write`.

```rust
if let Ok(report) = guard.report().build() {
    let file = File::create("flamegraph.svg").unwrap();
    report.flamegraph(file).unwrap();
};
```

Here is an example of generated flamegraph:

![flamegraph](https://user-images.githubusercontent.com/5244316/68021936-c1265e80-fcdd-11e9-8fa5-62b548bc751d.png)

## Frame Post Processor

Before the report was generated, `frame_post_processor` was provided as an interface to modify raw statistic data. If you want to group several symbols/thread or demangle for some symbols, this feature will benefit you.

For example: 

```rust
fn frames_post_processor() -> impl Fn(&mut pprof::Frames) {
    let thread_rename = [
        (Regex::new(r"^grpc-server-\d*$").unwrap(), "grpc-server"),
        (Regex::new(r"^cop-high\d*$").unwrap(), "cop-high"),
        (Regex::new(r"^cop-normal\d*$").unwrap(), "cop-normal"),
        (Regex::new(r"^cop-low\d*$").unwrap(), "cop-low"),
        (Regex::new(r"^raftstore-\d*$").unwrap(), "raftstore"),
        (Regex::new(r"^raftstore-\d*-\d*$").unwrap(), "raftstore"),
        (Regex::new(r"^sst-importer\d*$").unwrap(), "sst-importer"),
        (
            Regex::new(r"^store-read-low\d*$").unwrap(),
            "store-read-low",
        ),
        (Regex::new(r"^rocksdb:bg\d*$").unwrap(), "rocksdb:bg"),
        (Regex::new(r"^rocksdb:low\d*$").unwrap(), "rocksdb:low"),
        (Regex::new(r"^rocksdb:high\d*$").unwrap(), "rocksdb:high"),
        (Regex::new(r"^snap sender\d*$").unwrap(), "snap-sender"),
        (Regex::new(r"^snap-sender\d*$").unwrap(), "snap-sender"),
        (Regex::new(r"^apply-\d*$").unwrap(), "apply"),
        (Regex::new(r"^future-poller-\d*$").unwrap(), "future-poller"),
    ];

    move |frames| {
        for (regex, name) in thread_rename.iter() {
            if regex.is_match(&frames.thread_name) {
                frames.thread_name = name.to_string();
            }
        }
    }
}
```

```rust
if let Ok(report) = guard.frames_post_processor(frames_post_processor()).report().build() {
    let file = File::create("flamegraph.svg").unwrap();
    report.flamegraph(file).unwrap();
}
```

## Why not ...

There have been tons of profilers, why we create a new one? Here we make a comparison between `pprof-rs` and other popular profilers to help you choose the best fit one.

### gperftools

`gperftools` is also an integrated profiler. There is also a wrapper for `gperftools` in rust called [`cpuprofiler`](https://crates.io/crates/cpuprofiler) which makes it programmable for a rust program.

#### Pros

1. `pprof-rs` has a modern build system and can be integrated into a rust program easily while compiling `gperftools` statically is buggy.
2. `pprof-rs` has a native rust interface while `gperftools`'s wrapper is **just** a wrapper.
3. Programming with rust guarantees thread safety natively.

#### Cons

1. `gperftools` is a collection of performance analysis tools which contains cpu profiler, heap profiler... `pprof-rs` focuses on cpu profiler now.

### perf

`perf` is a performance analyzing tool in Linux.

#### Pros

1. You don't need to start another process to perf with `pprof-rs`.
2. `pprof-rs` can be easily integrated with rust program which means you don't need to install any other programs.
3. `pprof-rs` has a modern programmable interface to hack with
4. `pprof-rs` theoretically supports all POSIX systems and can easily support more systems in the future.

#### Cons

1. `perf` is much more feature-rich than `pprof-rs`.
2. `perf` is highly integrated with Linux.

## Implementation

When profiling was started, `setitimer` system call was used to set up a timer which will send a SIGPROF to this program every constant interval.

When receiving a SIGPROF signal, the signal handler will capture a backtrace and increase the count of it. After a while, the profiler can get every possible backtrace and their count. Finally, we can generate a report with profiler data.

However, the real world is full of thorns. There are many worths of note parts in the implementation.

### Backtrace

Unfortunately, there is no 100% robust stack tracing method. [Some related researches](https://github.com/gperftools/gperftools/wiki/gperftools%27-stacktrace-capturing-methods-and-their-issues) have been done by gperftools. `pprof-rs` uses [`backtrace-rs`](https://github.com/rust-lang/backtrace-rs) which finally uses libunwind provided by `glibc`

**WARN:** as described in former gperftools documents, libunwind provided by `glibc` is not signal safe. 

> libgcc's unwind method is not safe to use from signal handlers. One particular cause of deadlock is when profiling tick happens when program is propagating thrown exception.

### Signal Safety

Signal safety is hard to guarantee. But it's not *that* hard.

First, we have to avoid deadlock. When profiler samples or reports, it will get a global lock on the profiler. Particularly, deadlock happenswhen the running program is getting a report from the profiler (which will hold the lock), at the same time, a SIGPROF signal is triggered and the profiler wants to sample (which will also hold the lock). So we don't wait for the lock in signal handler, instead we `try_lock` in the signal handler. If the global lock cannot be gotten, the profiler will give up directly.

Then, signal safety POSIX function is quite limited as [listed here](http://man7.org/linux/man-pages/man7/signal-safety.7.html). The most bothering issue is that we cannot use `malloc` in signal handler. So we can only use pre-allocated memory in profiler. The simplest way is `write` every sample serially into a file. We optimized it with a fix-sized hashmap that has a fixed number of buckets and every bucket is an array with a fixed number of items. If the hashmap is full, we pop out the item with minimum count and write it into a temporary file.

Unit tests have been added to guarantee there is no `malloc` in sample functions.

`futex` is also not safe to use in signal handler. So we use a spin lock to avoid usage of `futex`.

## TODO

1. Restore the original SIGPROF handler after stopping the profiler.
2. Support generating [pprof format results](https://github.com/google/pprof/blob/master/proto/profile.proto)