# Profile Analyzer Tool Implementation Plan v2
## Overview
Build a Rust CLI tool (`profile-inspect`) that converts V8 .cpuprofile and .heapprofile files to human-readable formats and identifies hot paths for optimization.
## User Requirements
- **Project location**: `/Users/qing/p/github/profile-inspecting/`
- **Node internals filtering**: Filter out by default, show with `--include-internals` flag
- **Source map support**: Full support - parse .map files to resolve minified names
- **Output format**: Structured markdown with clear sections (tables, headers, bullet points)
- **Multiple outputs**: Support generating multiple formats simultaneously
## Architecture
```
profile-inspect/
├── Cargo.toml
├── src/
│ ├── main.rs # CLI entry point (clap)
│ ├── lib.rs # Library root
│ │
│ ├── types/ # Raw V8 profile types
│ │ ├── mod.rs
│ │ ├── cpu.rs # V8 CPU profile JSON types
│ │ └── heap.rs # V8 heap profile JSON types
│ │
│ ├── ir/ # Normalized Intermediate Representation
│ │ ├── mod.rs
│ │ ├── frame.rs # Frame: name, file, line, col, kind, category
│ │ ├── stack.rs # Stack: Vec<FrameId>
│ │ └── sample.rs # Sample: timestamp, stack_id, weight
│ │
│ ├── parser/ # V8 profile → IR conversion
│ │ ├── mod.rs
│ │ ├── cpu_profile.rs # CPU profile parser
│ │ └── heap_profile.rs # Heap profile parser
│ │
│ ├── classify/ # Frame classification
│ │ ├── mod.rs
│ │ └── frame_classifier.rs # Categorize: App, Deps, NodeInternal, V8, Native
│ │
│ ├── sourcemap/ # Source map resolution
│ │ ├── mod.rs
│ │ └── resolver.rs # Minified → original location mapping
│ │
│ ├── analysis/ # Analysis algorithms
│ │ ├── mod.rs
│ │ ├── cpu_analysis.rs # Self-time, total-time, hot paths
│ │ ├── heap_analysis.rs # Allocation analysis
│ │ ├── diff.rs # Profile comparison (A vs B)
│ │ └── caller_callee.rs # Call graph attribution
│ │
│ └── output/ # Output formatters
│ ├── mod.rs # OutputFormat trait
│ ├── markdown.rs # Markdown (default, LLM-optimized)
│ ├── text.rs # Plain text
│ ├── json.rs # JSON summary (for CI)
│ ├── speedscope.rs # Speedscope JSON
│ └── collapsed.rs # Collapsed stacks (flamegraph)
```
## Intermediate Representation (IR)
The IR normalizes both CPU and heap profiles into a common structure that all outputs consume:
```rust
// src/ir/frame.rs
pub struct Frame {
pub id: FrameId,
pub name: String, // Function name (resolved via sourcemap if available)
pub file: Option<String>, // Source file path
pub line: Option<u32>,
pub col: Option<u32>,
pub kind: FrameKind, // Function, Native, GC, Eval, Wasm, etc.
pub category: FrameCategory, // App, Deps, NodeInternal, V8Internal, Native
pub minified_name: Option<String>, // Original minified name if resolved
pub minified_location: Option<String>,
}
pub enum FrameKind { Function, Native, GC, Eval, Wasm, Builtin, RegExp, Idle, Unknown }
pub enum FrameCategory { App, Deps, NodeInternal, V8Internal, Native }
// src/ir/sample.rs
pub struct Sample {
pub timestamp_us: u64,
pub stack_id: StackId,
pub weight: u64, // Time delta (CPU) or bytes (Heap)
}
// src/ir/stack.rs
pub struct Stack {
pub id: StackId,
pub frames: Vec<FrameId>, // Root to leaf order
}
```
## CLI Interface
```bash
# Basic usage (markdown output by default)
profile-inspect cpu input.cpuprofile
profile-inspect heap input.heapprofile
# Multiple output formats simultaneously
profile-inspect cpu input.cpuprofile -f markdown -f speedscope -f json -o ./output/
# Filtering and grouping
profile-inspect cpu input.cpuprofile --group-by function # (default)
profile-inspect cpu input.cpuprofile --group-by file
profile-inspect cpu input.cpuprofile --group-by module
profile-inspect cpu input.cpuprofile --filter app # Only app code
profile-inspect cpu input.cpuprofile --filter deps # Only node_modules
profile-inspect cpu input.cpuprofile --include-internals # Include node/v8 internals
# Focus and exclude patterns
# Noise control
profile-inspect cpu input.cpuprofile --min-percent 1.0 # Drop <1% functions
profile-inspect cpu input.cpuprofile --top 30 # Show top 30
# Source map resolution
profile-inspect cpu input.cpuprofile --sourcemap-dir ./dist/
# Profile comparison (diff)
profile-inspect diff before.cpuprofile after.cpuprofile
# Explain a specific function (show callers/callees)
profile-inspect explain input.cpuprofile --function "getClassOrder"
# Output files generated:
# ./output/profile-analysis.md (markdown)
# ./output/profile-analysis.json (json summary)
# ./output/profile.speedscope.json (speedscope)
# ./output/profile.collapsed.txt (collapsed stacks)
```
## Output Formats
### 1. Markdown (Default)
- GitHub-flavored markdown with tables
- Time breakdown by category
- Collapsible sections for details
- Hot paths with call trees
- Caller/callee attribution
- **File**: `profile-analysis.md`
### 2. JSON Summary
- Machine-readable for CI/automation
- Performance regression checks
- Includes all metrics and hot paths
- **File**: `profile-analysis.json`
### 3. Speedscope JSON
- Compatible with https://www.speedscope.app/
- Interactive flame chart visualization
- **File**: `profile.speedscope.json`
### 4. Collapsed Stacks
- Brendan Gregg format for flamegraph tools
- Compatible with `flamegraph.pl`, `inferno`
- **File**: `profile.collapsed.txt`
### 5. Plain Text
- ASCII tables for terminal viewing
- **File**: `profile-analysis.txt`
## Key Algorithms
### CPU Analysis (Stack-Based Aggregation)
**Important**: Reconstruct stacks from samples, don't rely on DFS over node children.
```rust
pub fn analyze_cpu_profile(profile: &CpuProfile) -> CpuAnalysis {
// 1. Build parent map: node_id -> parent_id
let parent_map = build_parent_map(&profile.nodes);
// 2. For each sample, reconstruct full stack
let mut self_times: HashMap<FrameId, u64> = HashMap::new();
let mut total_times: HashMap<FrameId, u64> = HashMap::new();
let mut stack_weights: HashMap<Vec<FrameId>, u64> = HashMap::new();
for (i, &sample_node) in profile.samples.iter().enumerate() {
let weight = profile.time_deltas.get(i).copied().unwrap_or(0).max(0) as u64;
// Reconstruct stack: leaf -> root, then reverse
let stack = reconstruct_stack(sample_node, &parent_map);
// Leaf gets self-time
if let Some(&leaf) = stack.last() {
*self_times.entry(leaf).or_default() += weight;
}
// All frames on stack get total-time (inclusive)
for &frame in &stack {
*total_times.entry(frame).or_default() += weight;
}
// Aggregate by unique stack
*stack_weights.entry(stack).or_default() += weight;
}
// ...
}
```
### Frame Classification
```rust
impl FrameClassifier {
pub fn classify(&self, url: &str, name: &str) -> FrameCategory {
// Node internals
if url.starts_with("node:") || url.contains("internal/") {
return FrameCategory::NodeInternal;
}
// V8 internals
if name.starts_with("(") && name.ends_with(")") {
match name {
"(garbage collector)" | "(idle)" | "(program)" => {
return FrameCategory::V8Internal;
}
_ => {}
}
}
// Native/Builtin
if url.is_empty() || name.contains("Builtin:") || name == "(native)" {
return FrameCategory::Native;
}
// Dependencies
if url.contains("node_modules") {
return FrameCategory::Deps;
}
// App code (default)
FrameCategory::App
}
}
```
### Profile Diff
```rust
pub struct ProfileDiff {
pub before_total: u64,
pub after_total: u64,
pub regressions: Vec<FunctionDelta>, // Got slower
pub improvements: Vec<FunctionDelta>, // Got faster
pub new_hotspots: Vec<FunctionStats>, // Didn't exist before
}
pub struct FunctionDelta {
pub name: String,
pub location: String,
pub before_time: u64,
pub after_time: u64,
pub delta_percent: f64, // Positive = regression
}
```
## Dependencies
```toml
[dependencies]
clap = { version = "4", features = ["derive"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
thiserror = "2"
sourcemap = "9" # Source map parsing
regex = "1" # Pattern matching for filters
indexmap = { version = "2", features = ["serde"] } # Ordered maps
[dev-dependencies]
insta = "1" # Snapshot testing
```
## Files to Create
### Core Types & IR (10 files)
1. `Cargo.toml` - Project manifest
2. `src/main.rs` - CLI entry point
3. `src/lib.rs` - Library exports
4. `src/types/mod.rs` - V8 types exports
5. `src/types/cpu.rs` - Raw V8 CPU profile types
6. `src/types/heap.rs` - Raw V8 heap profile types
7. `src/ir/mod.rs` - IR exports
8. `src/ir/frame.rs` - Frame IR (FrameKind, FrameCategory)
9. `src/ir/stack.rs` - Stack IR
10. `src/ir/sample.rs` - Sample IR
### Parsing & Classification (7 files)
11. `src/parser/mod.rs` - Parser exports
12. `src/parser/cpu_profile.rs` - CPU profile → IR
13. `src/parser/heap_profile.rs` - Heap profile → IR
14. `src/classify/mod.rs` - Classifier exports
15. `src/classify/frame_classifier.rs` - Frame categorization rules
16. `src/sourcemap/mod.rs` - Source map exports
17. `src/sourcemap/resolver.rs` - Location resolution + caching
### Analysis (4 files)
18. `src/analysis/mod.rs` - Analysis exports
19. `src/analysis/cpu_analysis.rs` - CPU hot path detection
20. `src/analysis/heap_analysis.rs` - Heap allocation analysis
21. `src/analysis/diff.rs` - Profile comparison
22. `src/analysis/caller_callee.rs` - Call graph attribution
### Output (6 files)
23. `src/output/mod.rs` - OutputFormat trait + registry
24. `src/output/markdown.rs` - Markdown formatter
25. `src/output/text.rs` - Plain text formatter
26. `src/output/json.rs` - JSON summary formatter
27. `src/output/speedscope.rs` - Speedscope formatter
28. `src/output/collapsed.rs` - Collapsed stacks formatter
**Total: 28 files**
## Source Map Support
### Resolution Strategy
1. Extract URL from profile (e.g., `dist/dist--xku3Vyh.js:141153`)
2. Normalize URL schemes (`file://`, `webpack://`, `vite://`, `node:`)
3. Check for adjacent `.map` file
4. Parse inline sourcemaps (base64 data URLs)
5. Handle missing column info (line-only fallback with warning)
6. Cache resolved maps by URL + mtime
### Output Integration
- Display original names when resolved
- Show minified mapping in collapsible section
- Group frames by original source file
## Verification
```bash
# 1. Basic parsing
cargo run -- cpu test-runner-profile/CPU.*.cpuprofile
cargo run -- heap test-runner-profile/Heap.*.heapprofile
# 2. Multiple outputs
cargo run -- cpu test-runner-profile/CPU.*.cpuprofile -f markdown -f speedscope -f json -o ./out/
# 3. Filtering
cargo run -- cpu test-runner-profile/CPU.*.cpuprofile --filter app --min-percent 1.0
# 4. Source maps
cargo run -- cpu test-runner-profile/CPU.*.cpuprofile \
--sourcemap-dir /Users/qing/p/github/oxc_formatter/apps/oxfmt/dist/
# 5. Validate speedscope output at https://www.speedscope.app/
```
## Expected Markdown Output
```markdown
# CPU Profile Analysis
**File:** `CPU.20260124.005938.62230.0.001.cpuprofile`
| App code | 1,150 ms | 42.1% |
| Dependencies (node_modules) | 846 ms | 31.0% |
| Node internals | 491 ms | 18.0% |
| V8/Native | 244 ms | 8.9% |
## Top Functions by Self Time
| 1 | 450.23 ms | 16.5% | 1,234.56 ms | `getClassOrder` | `tailwind.ts:245:12` |
| 2 | 312.45 ms | 11.4% | 892.34 ms | `parseExpression` | `parser.ts:1892:5` |
<details>
<summary>Minified name mappings</summary>
| `getClassOrder` | `Bae` | `dist--xku3Vyh.js:141153` |
</details>
## Hot Paths
### Path #1 (12.3% of total, 336.12 ms)
```
sortTailwindClasses (prettier-plugin.ts:42)
└─ map (native)
└─ processClass (tailwind.ts:189)
└─ getClassOrder (tailwind.ts:245) ← HOTSPOT
```
## Caller/Callee Analysis for `getClassOrder`
**Top Callers:**
| `processClass` | 1,234 | 380 ms |
| `sortClasses` | 456 | 70 ms |
**Top Callees:**
| `compileCss` | 180 ms |
| `lookupRule` | 95 ms |
## Recommendations
### Critical (>20% impact)
- **`getClassOrder`** at `tailwind.ts:245:12`
- 16.5% of total CPU time (450.23ms)
- Called from `processClass` and `sortClasses`
- **Suggestion:** Cache results by class name
### High (10-20% impact)
- **`parseExpression`** at `parser.ts:1892:5`
- 11.4% of total CPU time (312.45ms)
- **Suggestion:** Consider if re-parsing can be avoided
## Agent Summary
**Key Bottlenecks:**
1. `getClassOrder` (Tailwind CSS class ordering) - 16.5%
2. `parseExpression` (Babel parsing) - 11.4%
**Time Distribution:**
- 42% in app code - good target for optimization
- 31% in dependencies - consider upgrading or replacing slow deps
- 27% in internals/native - likely unavoidable
**Suggested Optimizations:**
- Cache `getClassOrder` results - classes repeat frequently
- Investigate if Babel parsing can be cached or avoided
```