mrrc 0.7.6 - Docs.rs

# Profiling Guide

This guide documents how to profile mrrc performance using various tools available in the Rust ecosystem.

## Prerequisites

### cargo-flamegraph

Install flamegraph for CPU profiling:

```bash
cargo install flamegraph
```

**macOS Requirements:**
- Full Xcode (not just Command Line Tools) - required for `xctrace`
- If you only have Command Line Tools, flamegraph will fail to sample

**Linux Requirements:**
- `perf` package (usually included with build tools)
- `perl` for flamegraph script processing

## CPU Profiling with Flamegraph

### Single-threaded Benchmark

Profile the `profiling_harness` benchmark (single-threaded file reading):

```bash
cargo flamegraph --bench profiling_harness -o flamegraphs/profiling_harness.svg
```

This produces an interactive SVG showing:
- Time spent in each function
- Call stack depth
- Percentage of total runtime per function

### Concurrent Benchmark

Profile the `parallel_benchmarks` benchmark (rayon-based concurrent reading):

```bash
cargo flamegraph --bench parallel_benchmarks -o flamegraphs/parallel_benchmarks.svg
```

### Detailed Profiling

Profile the `detailed_profiling` benchmark with memory and phase analysis:

```bash
cargo flamegraph --bench detailed_profiling -o flamegraphs/detailed_profiling.svg
```

### Custom Flamegraph Options

Generate flamegraph with full call stacks and all events:

```bash
cargo flamegraph --bench profiling_harness -- --verbose \
  -o flamegraphs/profiling_harness_verbose.svg
```

Frequency (sampling rate in Hz, default 99):

```bash
cargo flamegraph --bench profiling_harness -- -F 1000 \
  -o flamegraphs/profiling_harness_1khz.svg
```

## Interpreting Flamegraphs

1. **Width** = time spent in function (wider = more time)
2. **Height** = call stack depth (taller = deeper nesting)
3. **Color** = random (for visual distinction)
4. **X-axis** = execution order
5. **Y-axis** = call stack

### Common Patterns to Look For

- **Flat top** = pure computation or tight loop
- **Sawtooth pattern** = recursive calls or repeated function sequences
- **Thin slices** = low-overhead functions called many times
- **Wide base** = time spent in main execution path

## Quick Profiling Checklist

For performance optimization work:

```bash
# Build debug symbols (required for flamegraph on release builds)
# Already configured in Cargo.toml [profile.bench] section

# Profile single-threaded baseline
cargo flamegraph --bench profiling_harness -o /tmp/single.svg

# Profile concurrent implementation
cargo flamegraph --bench parallel_benchmarks -o /tmp/concurrent.svg

# Compare outputs in browser:
# - Wider functions = optimization opportunities
# - Identify bottlenecks not visible in benchmarks
```

## Integration with CI/CD

Flamegraphs can be:
- Committed to git (`flamegraphs/` directory)
- Uploaded to artifact storage
- Compared across releases for regression detection

## Related Tasks

- **mrrc-u33.2.1**: Setup flamegraph and perf profiling tools (this document)
- **mrrc-u33.1**: Review pure Rust file read performance and concurrency optimization
- **mrrc-u33.3**: Profile pure Rust mrrc concurrent performance (rayon)
- **mrrc-u33.4**: Profile Python wrapper (pymrrc) single-threaded performance
- **mrrc-u33.5**: Profile Python wrapper (pymrrc) concurrent performance

## Troubleshooting

### macOS: "tool 'xctrace' requires Xcode"
- Install full Xcode: `xcode-select --install` (installs Command Line Tools)
- Full Xcode required: Download from App Store or developer.apple.com

### Linux: "perf not found"
```bash
sudo apt-get install linux-tools  # Debian/Ubuntu
sudo yum install perf              # RHEL/Fedora
```

### Flamegraph output empty
- Ensure binary is running long enough to collect samples
- Increase bench duration or iterations
- Use `-F` flag to increase sampling frequency

## Future Enhancements

- [ ] Integrate with criterion.rs flamegraph output
- [ ] Automated flamegraph generation in CI
- [ ] Historical comparison tracking
- [ ] Multi-threaded flamegraph analysis tools