profile-bee 0.3.1

eBPF-based CPU profiler with flamegraph generation, DWARF unwinding, and interactive TUI
Documentation

profile-bee

An eBPF-based CPU profiler for Linux, written in Rust. Single binary, no BCC/libbpf dependencies.

Architecture

TUI Screenshot

About

Profile Bee is an eBPF-based CPU profiler that ships as a single binary — no BCC, libbpf, or perf tooling needed on the target host. Built with Rust and aya.

  • Just cargo install, sudo probee --tui, and you're looking at a live flamegraph — no package manager dance, no Python dependencies, no separate visualization step
  • Walks stacks directly in the kernel via frame pointers (fast, the default) or DWARF unwind tables (for those -O2 binaries everyone ships without frame pointers)
  • Attaches to perf events, kprobes, uprobes, or tracepoints — auto-discovers uprobe targets with glob and regex matching
  • Demangles Rust and C++ symbols out of the box
  • Outputs to interactive TUI, SVG, HTML, JSON, stackcollapse, or a real-time web server — whatever fits your workflow

Install

cargo install profile-bee

Installs probee and pbee (short alias). No nightly Rust required — a prebuilt eBPF binary is bundled. Requires root to run (eBPF).

Quick Start

# Interactive TUI flamegraph (live, system-wide)
sudo probee --tui

# Profile a specific command
sudo probee --tui --cmd "my-application"

# Generate an SVG flamegraph
sudo probee --svg flamegraph.svg --time 5000

# Profile a command with args
sudo probee --svg output.svg -- ./my-binary arg1 arg2

# Real-time flamegraphs via web server
sudo probee --serve --skip-idle

# Trace function calls with uprobe
sudo probee --uprobe malloc --time 1000 --svg malloc.svg

Run probee with no arguments or probee --help for the full list of options and examples.

Features

  • Interactive TUI — real-time flamegraph viewer with vim-style navigation, search, and zoom
  • Multiple output formats — SVG, HTML, JSON (d3), and stackcollapse format
  • Frame pointer unwinding (default) — fast eBPF-based stack walking via bpf_get_stackid
  • DWARF unwinding (--dwarf) — profiles -O2/-O3 binaries without frame pointers using .eh_frame tables loaded into eBPF maps
  • Smart uprobes — GDB-style symbol resolution with glob, regex, demangled name matching, and multi-attach
  • kprobe & tracepoint support — profile kernel functions and tracepoint events
  • Real-time web server (--serve) — live flamegraph updates over HTTP
  • Automatic termination — stops when --pid target or --cmd process exits
  • Rust & C++ demangling — via gimli/blazesym
  • BPF-based aggregation — stack counting in kernel to reduce userspace data transfer
  • Group by CPU — per-core flamegraph breakdown

Detailed Usage

Output Formats

# SVG flamegraph
sudo probee --svg profile.svg --frequency 999 --time 5000

# HTML flamegraph
sudo probee --time 5000 --html flamegraphs.html

# Stackcollapse format (compatible with speedscope, flamegraph.pl)
sudo probee --collapse profile.txt --frequency 999 --time 10000

# All output formats at once
sudo probee --time 5000 --html out.html --json out.json --collapse out.txt --svg out.svg

# Grouped by CPU
sudo probee --svg profile.svg --frequency 999 --time 2000 --group-by-cpu

Targeting

# Profile specific PID (auto-stops when process exits)
sudo probee --pid <pid> --svg output.svg --time 10000

# Profile specific CPU core
sudo probee --cpu 0 --svg output.svg --time 5000

# Profile a command
sudo probee --svg output.svg -- ./my-binary arg1 arg2

# Real-time flamegraphs via web server
sudo probee --time 5000 --serve --skip-idle --stream-mode 1
# Then open http://localhost:8000/ and click "realtime-updates"

Kprobe & Tracepoint

# Profile kernel function calls
sudo probee --kprobe vfs_write --time 200 --svg kprobe.svg

# Profile tracepoint events
sudo probee --tracepoint tcp:tcp_probe --time 200 --svg tracepoint.svg

Smart Uprobe Targeting

Profile-bee supports GDB-style symbol resolution for uprobes. Instead of manually specifying which library a function lives in, you provide a probe spec and the tool auto-discovers matching symbols across all loaded ELF binaries.

# Auto-discover library
sudo probee --uprobe malloc --time 1000 --svg malloc.svg

# Multiple probes at once
sudo probee --uprobe malloc --uprobe 'ret:free' --time 1000 --svg alloc.svg

# Glob matching — trace all pthread functions
sudo probee --uprobe 'pthread_*' --time 1000 --svg pthread.svg

# Regex matching
sudo probee --uprobe '/^sql_.*query/' --pid 1234 --time 2000 --svg sql.svg

# Demangled C++/Rust name matching
sudo probee --uprobe 'std::vector::push_back' --pid 1234 --time 1000 --svg vec.svg

# Source file and line number (requires DWARF debug info)
sudo probee --uprobe 'main.c:42' --pid 1234 --time 1000 --svg source.svg

# Explicit library prefix
sudo probee --uprobe libc:malloc --time 1000 --svg malloc.svg

# Absolute path to binary
sudo probee --uprobe '/usr/lib/libc.so.6:malloc' --time 1000 --svg malloc.svg

# Return probe (uretprobe)
sudo probee --uprobe ret:malloc --time 1000 --svg malloc_ret.svg

# Function with offset
sudo probee --uprobe malloc+0x10 --time 1000 --svg malloc_offset.svg

# Scope to a specific PID
sudo probee --uprobe malloc --uprobe-pid 12345 --time 1000 --svg malloc_pid.svg

# Discovery mode — list matching symbols without attaching
sudo probee --list-probes 'pthread_*' --pid 1234

Probe spec syntax:

Syntax Example Description
function malloc Exact match, auto-discover library
lib:function libc:malloc Explicit library name prefix
/path:function /usr/lib/libc.so.6:malloc Absolute path prefix
ret:function ret:malloc Return probe (uretprobe)
function+offset malloc+0x10 Function with byte offset
glob_pattern pthread_* Glob matching (*, ?, [...])
/regex/ /^sql_.*query/ Regex matching
Namespace::func std::vector::push_back Demangled C++/Rust name match
file.c:line main.c:42 Source location (requires DWARF)

Resolution order:

  1. If --pid or --uprobe-pid is set, scans /proc/<pid>/maps for all mapped executables
  2. Otherwise, scans system libraries via ldconfig cache and standard paths
  3. For each candidate ELF, reads .symtab and .dynsym symbol tables
  4. Demangled matching uses both Rust and C++ demanglers
  5. Source locations are resolved via gimli .debug_line parsing

Multi-attach: If a spec matches multiple symbols (e.g. pthread_* matching 20 functions), uprobes are attached to all of them.


TUI Mode

The interactive terminal flamegraph viewer is included by default (forked and adapted from flamelens).

# Interactive TUI with a command
sudo probee --tui --cmd "your-command"

# Live profiling of a running process
sudo probee --tui --pid <pid> --time 30000

# With DWARF unwinding for optimized binaries
sudo probee --tui --dwarf --cmd "./optimized-binary"

# Build without TUI support
cargo build --release --no-default-features

Key Bindings:

Key Action
hjkl / arrows Navigate cursor
Enter Zoom into selected frame
Esc Reset zoom
/ Search frames with regex
# Highlight selected frame
n / N Next / previous match
z Freeze / unfreeze live updates
q or Ctrl+C Quit

Stack Unwinding

Profile Bee supports two methods for stack unwinding. Both run the actual stack walking in eBPF (kernel space) for performance. Symbolization always happens in userspace.

Frame Pointer Method (default)

Uses the kernel's bpf_get_stackid to walk the frame pointer chain. Works out of the box for binaries compiled with frame pointers:

  • Rust: RUSTFLAGS="-Cforce-frame-pointers=yes"
  • C/C++: -fno-omit-frame-pointer flag

DWARF Method (--dwarf)

Handles binaries compiled without frame pointers (the default for most -O2/-O3 builds). Use --dwarf to enable DWARF-based stack unwinding.

How it works:

  1. At startup, userspace parses /proc/[pid]/maps and .eh_frame sections from each executable mapping
  2. Pre-evaluates DWARF CFI rules into a flat UnwindEntry table (PC → CFA rule + RA rule)
  3. Loads the table into eBPF maps before profiling begins
  4. At sample time, the eBPF program binary-searches the table and walks the stack using CFA computation + bpf_probe_read_user
  5. A background thread polls for newly loaded libraries (e.g. via dlopen) and updates the unwind tables at runtime

This is the same approach used by parca-agent and other production eBPF profilers.

# Enable DWARF unwinding for a no-frame-pointer binary
sudo probee --dwarf --svg output.svg --time 5000 -- ./my-optimized-binary

# Frame pointer unwinding (the default)
sudo probee --svg output.svg --time 5000 -- ./my-fp-binary

Note: For symbol resolution, you still need debug information:

  • Rust: Add -g flag when compiling
  • C/C++: Compile with debug symbols (-g flag)

Limitations: Max 16 executable mappings per process, 500K unwind table entries total, 32 frame depth. x86_64 only. Libraries loaded via dlopen are detected within ~1 second.

See docs/dwarf_unwinding_design.md for architecture details, and Polar Signals' article on profiling without frame pointers for background.


Limitations

  • Linux only (requires eBPF support)
  • DWARF unwinding: x86_64 only, see limits above
  • Interpreted / JIT stack traces not yet supported
  • VDSO .eh_frame parsed for DWARF unwinding; VDSO symbolization not yet supported

Development

Prerequisites

  1. Install stable and nightly Rust: rustup install stable nightly
  2. Install bpf-linker: cargo install bpf-linker

Build

# Build eBPF program (requires nightly)
cargo xtask build-ebpf

# Build userspace (uses fresh eBPF build if available, otherwise prebuilt)
cargo build --release

# Run
cargo xtask run

To perform a release build of the eBPF program, use cargo xtask build-ebpf --release. You may also change the target architecture with the --target flag.

More documentation in the docs directory.

Alternatives