pyroscope-rbspy-oncpu 0.19.1

Sampling CPU profiler for Ruby
Documentation
# rbspy architecture

rbspy is a little complicated. I want other people to be able to contribute to it easily, so here is
an architecture document to help you understand how it works.

Here’s what happens when you run `rbspy snapshot --pid $PID`. This is the simplest subcommand (it takes a
PID and gets you the current stack trace from that PID), and if you understand how `snapshot` works
you can relatively easily understand how the rest of the `rbspy` subcommands work as well.

The implementation of the `snapshot` function in `main.rs` is really simple. The goal of this
document is to explain how that code works behind the scenes.

```rust
let snap = recorder::snapshot(pid, lock_process, force_version)?;
println!("{}", snap);
```

## Phase 1: Initialize. (`ruby_spy.rs` + `address_finder.rs`)

Our first goal is to create a struct (`RubySpy`) which we can call `.get_stack_trace()` on to get a
stack trace. This struct contains a PID, a function, and the address in the target process of the
current thread. The initialization code is somewhat complicated but has a simple interface: you give
it a PID, and it returns a struct that you can call `.get_stack_trace()` on:

```rust
let spy = RubySpy::new(pid, None)?;
let lock_process = false;
spy.get_stack_trace(lock_process)
```

Here's what happens when you call `RubySpy::new(pid, None)`.

**Step 1**: **Find the Ruby version of the process**. The code to do this is in a function called
`get_ruby_version`.

**Step 2**: **Find the address of the `ruby_current_thread` global variable**. This address is the
starting point for getting a stack trace from our Ruby process -- we start there every time. How we do
this depends on 2 things -- whether the Ruby process we’re profiling has symbols, and the Ruby
version (in 2.5.0+ there are some small differences).

If there are symbols, we find the address of the current thread using the symbol table.
(`current_thread_address_location_symbol_table` function). This is pretty straightforward. We look
up `ruby_current_thread` or `ruby_current_execution_context_ptr` depending on the Ruby version.

If there **aren’t** symbols, instead we use a heuristic
(`current_thread_address_location_search_bss`) where we search through the `.bss` section of our
binary’s memory for something that plausibly looks like the address of the current thread. This
assumes that the address we want is in the `.bss` section somewhere.  How this works:

* Find the address of the `.bss` section and read it from memory
* Cast the `.bss` section to an array of `usize` (so an array of addresses).
* Iterate through that array and for every address run the `is_maybe_thread` function on that
  address. `is_maybe_thread` is a Ruby-version-specific function (we compile a different version of
  this function for every Ruby version). We'll explain this later.
* Return an address if `is_maybe_thread` returns true for any of them. Otherwise abort.

**Step 3**: **Get the right `stack_trace` function**. We compile 30+ different functions to get
stack_traces (will explain this later). The code to decide which function to use is basically a huge
switch statement (see `supported_ruby_versions.rs`), depending on the Ruby version.

```rust
pub fn get(v: &str) -> Result<RubyVersion> {
    match v {
        ...
        "3.3.0" => Ok(RubyVersion {
            semver_version: Version::new(3, 3, 0),
            get_execution_context_fn: super::ruby_version::ruby_3_3_0::get_execution_context,
            get_stack_trace_fn: super::ruby_version::ruby_3_3_0::get_stack_trace,
            is_maybe_thread_fn: super::ruby_version::ruby_3_3_0::is_maybe_thread,
        }),
        ...
    }
}
```

**Step 4**: **Return the `RubySpy` struct**.

Now we're done! We return our `RubySpy` struct.

## Phase 2: Get stack traces (`ruby_version.rs`, `ruby-bindings/` crate, `bindgen.sh`)

Once we've initialized, all that remains is calling the `get_stack_trace` function. How does that function
work?

Like we said before -- we compile a different version of the code to get stack traces for every Ruby
version. This is because every Ruby version has slightly different struct layouts.

The Ruby structs are defined in a `ruby-bindings` crate. All the code in that crate is autogenerated
by bindgen in `xtask/src/bindgen.rs`.

These functions are defined through a bunch of macros (4 different macros, for different ranges of
Ruby versions) which implement `get_stack_trace` for every Ruby version. Each one uses the right
Ruby.

There's a lot of code in `ruby_version.rs` but this is the core of how it works. First, it defines a
`$ruby_version` module and inside that module uses `bindings::$ruby_version` which includes all the
required struct definitions for that Ruby version.

Then it includes **more** macros which together make up the body of that module. This is because
some functions are the same across all Ruby versions (like `get_cfps`) and some are different
(like `get_stack_frame` which changes frequently because the way Ruby organizes that code changes a
lot).

```rust
macro_rules! ruby_version_v_2_0_to_2_2(
    ($ruby_version:ident) => (
       pub mod $ruby_version {
            use bindings::$ruby_version::*;
            ...
            get_stack_trace!(rb_thread_struct);
            get_execution_context_from_thread!(rb_thread_struct);
            rstring_as_array_1_9_1!();
            get_ruby_string_1_9_1!();
            get_cfps!();
            get_pos!(rb_iseq_struct);
            get_lineno_2_0_0!();
            get_stack_frame_2_0_0!();
            stack_field_1_9_0!();
            get_thread_id_1_9_0!();
            get_cfunc_name_unsupported!();
}
 ```

Several of rbspy's core functions, such as interpreting ruby strings and identifying C functions,
were ported directly from gdb scripts in the official ruby repository or other community
repositories.