rbspy-oncpu 0.12.1

Sampling CPU profiler for Ruby
Documentation
# rbspy architecture

rbspy is a little complicated. I want other people to be able to contribute to it easily, so here is
an architecture document to help you understand how it works.

Here’s what happens when you run `rbspy snapshot --pid $PID`. This is the simplest subcommand (it takes a
PID and gets you the current stack trace from that PID), and if you understand how `snapshot` works
you can relatively easily understand how the rest of the `rbspy` subcommands work as well.

The implementation of the `snapshot` function in `main.rs` is really simple: just 6 lines of code.
The goal of this document is to explain how that code works behind the scenes.

```
fn snapshot(pid: pid_t) -> Result<(), Error> {
    let getter = initialize::initialize(pid)?;
    let trace = getter.get_trace()?;
    for x in trace.iter().rev() {
        println!("{}", x);
    }
    Ok(())
}
```

## Phase 1: Initialize. (`initialize.rs` + `address_finder.rs`)

Our first goal is to create a struct (`StackTraceGetter`) which we can call `.get()` on to get a
stack trace. This struct contains a PID, a function, and the address in the target process of the
current thread. The initialization code is somewhat complicated but has a simple interface: you give
it a PID, and it returns a struct that you can call `.get_trace()` on:

```
let getter = initialize.initialize(pid)
getter.get_trace()
```

Here's what happens when you call `initialize(pid)`.

**Step 1**: **Find the Ruby version of the process**. The code to do this is in a function called
`get_ruby_version`.

**Step 2**: **Find the address of the `ruby_current_thread` global variable**. This address is the
starting point for getting a stack trace from our Ruby process -- we start there every time. How we do
this depends on 2 things -- whether the Ruby process we’re profiling has symbols, and the Ruby
version (in 2.5.0+ there are some small differences).

If there are symbols, we find the address of the current thread using the symbol table.
(`current_thread_address_location_symbol_table` function). This is pretty straightforward. We look
up `ruby_current_thread` or `ruby_current_execution_context_ptr` depending on the Ruby version.

If there **aren’t** symbols, instead we use a heuristic
(`current_thread_address_location_search_bss`) where we search through the `.bss` section of our
binary’s memory for something that plausibly looks like the address of the current thread. This
assumes that the address we want is in the `.bss` section somewhere.  How this works:

* Find the address of the `.bss` section and read it from memory
* Cast the `.bss` section to an array of `usize` (so an array of addresses).
* Iterate through that array and for every address run the `is_maybe_thread` function on that
  address. `is_maybe_thread` is a Ruby-version-specific function (we compile a different version of
  this function for every Ruby version). We'll explain this later.
* Return an address if `is_maybe_thread` returns true for any of them. Otherwise abort.

**Step 3**: **Get the right `stack_trace` function**. We compile 30+ different functions to get
stack_traces (will explain this later). The code to decide which function to use is basically a huge
switch statement, depending on the Ruby version.

```
  "1.9.1" => self::ruby_1_9_1_0::get_stack_trace,
  "1.9.2" => self::ruby_1_9_2_0::get_stack_trace,
  "1.9.3" => self::ruby_1_9_3_0::get_stack_trace,
```

**Step 4**: **Return the `getter` struct**.

Now we're done! We return our `StackTraceGetter` struct.

```
pub fn initialize(pid: pid_t) -> Result<StackTraceGetter, Error> {
    let version = get_ruby_version_retry(pid).context("Couldn't determine Ruby version")?;
    debug!("version: {}", version);
    Ok(StackTraceGetter {
        pid: pid,
        current_thread_addr_location: os_impl::current_thread_address(pid, &version)?,
        stack_trace_function: stack_trace::get_stack_trace_function(&version),
    })
}

impl StackTraceGetter {
    pub fn get_trace(&self) -> Result<Vec<StackFrame>, MemoryCopyError> {
        let stack_trace_function = &self.stack_trace_function;
        stack_trace_function(self.current_thread_addr_location, self.pid)
    }
}
```

## Phase 2: Get stack traces (`ruby_version.rs`, `ruby-bindings/` crate, `bindgen.sh`)

Once we've initialized, all that remains is calling the `get_trace` function. How does that function
work?

Like we said before -- we compile a different version of the code to get stack traces for every Ruby
version. This is because every Ruby version has slightly different struct layouts.

The Ruby structs are defined in a `ruby-bindings` crate. All the code in that crate is autogenerated
by bindgen, using a hacky script called `bindgen.sh`. 

These functions are defined through a bunch of macros (4 different macros, for different ranges of
Ruby versions) which implement `get_stack_trace` for every Ruby version. Each one uses the right
Ruby.

There's a lot of code in `ruby_version.rs` but this is the core of how it works. First, it defines a
`$ruby_version` module and inside that module uses `bindings::$ruby_version` which includes all the
required struct definitions for that Ruby version.

Then it includes **more** macros which together make up the body of that module. This is because
some functions are the same across all Ruby versions (like `get_ruby_string`) and some are different
(like `get_stack_frame` which changes frequently because the way Ruby organizes that code changes a
lot).

```
macro_rules! ruby_version_v_2_0_to_2_2(
    ($ruby_version:ident) => (
       pub mod $ruby_version {
            use bindings::$ruby_version::*;
            ...
            get_stack_trace!(rb_thread_struct);
            get_ruby_string!();
            get_cfps!();
            get_lineno_2_0_0!();
            get_stack_frame_2_0_0!();
            is_stack_base_1_9_0!();
}
 ```