profile-inspect 0.1.3

Analyze V8 CPU and heap profiles from Node.js/Chrome DevTools
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
# Profile Analyzer Tool Implementation Plan v2

## Overview
Build a Rust CLI tool (`profile-inspect`) that converts V8 .cpuprofile and .heapprofile files to human-readable formats and identifies hot paths for optimization.

## User Requirements
- **Project location**: `/Users/qing/p/github/profile-inspecting/`
- **Node internals filtering**: Filter out by default, show with `--include-internals` flag
- **Source map support**: Full support - parse .map files to resolve minified names
- **Output format**: Structured markdown with clear sections (tables, headers, bullet points)
- **Multiple outputs**: Support generating multiple formats simultaneously

## Architecture

```
profile-inspect/
├── Cargo.toml
├── src/
│   ├── main.rs                 # CLI entry point (clap)
│   ├── lib.rs                  # Library root
│   │
│   ├── types/                  # Raw V8 profile types
│   │   ├── mod.rs
│   │   ├── cpu.rs              # V8 CPU profile JSON types
│   │   └── heap.rs             # V8 heap profile JSON types
│   │
│   ├── ir/                     # Normalized Intermediate Representation
│   │   ├── mod.rs
│   │   ├── frame.rs            # Frame: name, file, line, col, kind, category
│   │   ├── stack.rs            # Stack: Vec<FrameId>
│   │   └── sample.rs           # Sample: timestamp, stack_id, weight
│   │
│   ├── parser/                 # V8 profile → IR conversion
│   │   ├── mod.rs
│   │   ├── cpu_profile.rs      # CPU profile parser
│   │   └── heap_profile.rs     # Heap profile parser
│   │
│   ├── classify/               # Frame classification
│   │   ├── mod.rs
│   │   └── frame_classifier.rs # Categorize: App, Deps, NodeInternal, V8, Native
│   │
│   ├── sourcemap/              # Source map resolution
│   │   ├── mod.rs
│   │   └── resolver.rs         # Minified → original location mapping
│   │
│   ├── analysis/               # Analysis algorithms
│   │   ├── mod.rs
│   │   ├── cpu_analysis.rs     # Self-time, total-time, hot paths
│   │   ├── heap_analysis.rs    # Allocation analysis
│   │   ├── diff.rs             # Profile comparison (A vs B)
│   │   └── caller_callee.rs    # Call graph attribution
│   │
│   └── output/                 # Output formatters
│       ├── mod.rs              # OutputFormat trait
│       ├── markdown.rs         # Markdown (default, LLM-optimized)
│       ├── text.rs             # Plain text
│       ├── json.rs             # JSON summary (for CI)
│       ├── speedscope.rs       # Speedscope JSON
│       └── collapsed.rs        # Collapsed stacks (flamegraph)
```

## Intermediate Representation (IR)

The IR normalizes both CPU and heap profiles into a common structure that all outputs consume:

```rust
// src/ir/frame.rs
pub struct Frame {
    pub id: FrameId,
    pub name: String,                    // Function name (resolved via sourcemap if available)
    pub file: Option<String>,            // Source file path
    pub line: Option<u32>,
    pub col: Option<u32>,
    pub kind: FrameKind,                 // Function, Native, GC, Eval, Wasm, etc.
    pub category: FrameCategory,         // App, Deps, NodeInternal, V8Internal, Native
    pub minified_name: Option<String>,   // Original minified name if resolved
    pub minified_location: Option<String>,
}

pub enum FrameKind { Function, Native, GC, Eval, Wasm, Builtin, RegExp, Idle, Unknown }
pub enum FrameCategory { App, Deps, NodeInternal, V8Internal, Native }

// src/ir/sample.rs
pub struct Sample {
    pub timestamp_us: u64,
    pub stack_id: StackId,
    pub weight: u64,  // Time delta (CPU) or bytes (Heap)
}

// src/ir/stack.rs
pub struct Stack {
    pub id: StackId,
    pub frames: Vec<FrameId>,  // Root to leaf order
}
```

## CLI Interface

```bash
# Basic usage (markdown output by default)
profile-inspect cpu input.cpuprofile
profile-inspect heap input.heapprofile

# Multiple output formats simultaneously
profile-inspect cpu input.cpuprofile -f markdown -f speedscope -f json -o ./output/

# Filtering and grouping
profile-inspect cpu input.cpuprofile --group-by function   # (default)
profile-inspect cpu input.cpuprofile --group-by file
profile-inspect cpu input.cpuprofile --group-by module
profile-inspect cpu input.cpuprofile --filter app          # Only app code
profile-inspect cpu input.cpuprofile --filter deps         # Only node_modules
profile-inspect cpu input.cpuprofile --include-internals   # Include node/v8 internals

# Focus and exclude patterns
profile-inspect cpu input.cpuprofile --focus "tailwind|prettier"
profile-inspect cpu input.cpuprofile --exclude "node_modules"

# Noise control
profile-inspect cpu input.cpuprofile --min-percent 1.0     # Drop <1% functions
profile-inspect cpu input.cpuprofile --top 30              # Show top 30

# Source map resolution
profile-inspect cpu input.cpuprofile --sourcemap-dir ./dist/

# Profile comparison (diff)
profile-inspect diff before.cpuprofile after.cpuprofile

# Explain a specific function (show callers/callees)
profile-inspect explain input.cpuprofile --function "getClassOrder"

# Output files generated:
#   ./output/profile-analysis.md        (markdown)
#   ./output/profile-analysis.json      (json summary)
#   ./output/profile.speedscope.json    (speedscope)
#   ./output/profile.collapsed.txt      (collapsed stacks)
```

## Output Formats

### 1. Markdown (Default)
- GitHub-flavored markdown with tables
- Time breakdown by category
- Collapsible sections for details
- Hot paths with call trees
- Caller/callee attribution
- **File**: `profile-analysis.md`

### 2. JSON Summary
- Machine-readable for CI/automation
- Performance regression checks
- Includes all metrics and hot paths
- **File**: `profile-analysis.json`

### 3. Speedscope JSON
- Compatible with https://www.speedscope.app/
- Interactive flame chart visualization
- **File**: `profile.speedscope.json`

### 4. Collapsed Stacks
- Brendan Gregg format for flamegraph tools
- Compatible with `flamegraph.pl`, `inferno`
- **File**: `profile.collapsed.txt`

### 5. Plain Text
- ASCII tables for terminal viewing
- **File**: `profile-analysis.txt`

## Key Algorithms

### CPU Analysis (Stack-Based Aggregation)

**Important**: Reconstruct stacks from samples, don't rely on DFS over node children.

```rust
pub fn analyze_cpu_profile(profile: &CpuProfile) -> CpuAnalysis {
    // 1. Build parent map: node_id -> parent_id
    let parent_map = build_parent_map(&profile.nodes);

    // 2. For each sample, reconstruct full stack
    let mut self_times: HashMap<FrameId, u64> = HashMap::new();
    let mut total_times: HashMap<FrameId, u64> = HashMap::new();
    let mut stack_weights: HashMap<Vec<FrameId>, u64> = HashMap::new();

    for (i, &sample_node) in profile.samples.iter().enumerate() {
        let weight = profile.time_deltas.get(i).copied().unwrap_or(0).max(0) as u64;

        // Reconstruct stack: leaf -> root, then reverse
        let stack = reconstruct_stack(sample_node, &parent_map);

        // Leaf gets self-time
        if let Some(&leaf) = stack.last() {
            *self_times.entry(leaf).or_default() += weight;
        }

        // All frames on stack get total-time (inclusive)
        for &frame in &stack {
            *total_times.entry(frame).or_default() += weight;
        }

        // Aggregate by unique stack
        *stack_weights.entry(stack).or_default() += weight;
    }
    // ...
}
```

### Frame Classification

```rust
impl FrameClassifier {
    pub fn classify(&self, url: &str, name: &str) -> FrameCategory {
        // Node internals
        if url.starts_with("node:") || url.contains("internal/") {
            return FrameCategory::NodeInternal;
        }

        // V8 internals
        if name.starts_with("(") && name.ends_with(")") {
            match name {
                "(garbage collector)" | "(idle)" | "(program)" => {
                    return FrameCategory::V8Internal;
                }
                _ => {}
            }
        }

        // Native/Builtin
        if url.is_empty() || name.contains("Builtin:") || name == "(native)" {
            return FrameCategory::Native;
        }

        // Dependencies
        if url.contains("node_modules") {
            return FrameCategory::Deps;
        }

        // App code (default)
        FrameCategory::App
    }
}
```

### Profile Diff

```rust
pub struct ProfileDiff {
    pub before_total: u64,
    pub after_total: u64,
    pub regressions: Vec<FunctionDelta>,  // Got slower
    pub improvements: Vec<FunctionDelta>, // Got faster
    pub new_hotspots: Vec<FunctionStats>, // Didn't exist before
}

pub struct FunctionDelta {
    pub name: String,
    pub location: String,
    pub before_time: u64,
    pub after_time: u64,
    pub delta_percent: f64,  // Positive = regression
}
```

## Dependencies

```toml
[dependencies]
clap = { version = "4", features = ["derive"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
thiserror = "2"
sourcemap = "9"              # Source map parsing
regex = "1"                  # Pattern matching for filters
indexmap = { version = "2", features = ["serde"] }  # Ordered maps

[dev-dependencies]
insta = "1"                  # Snapshot testing
```

## Files to Create

### Core Types & IR (10 files)
1. `Cargo.toml` - Project manifest
2. `src/main.rs` - CLI entry point
3. `src/lib.rs` - Library exports
4. `src/types/mod.rs` - V8 types exports
5. `src/types/cpu.rs` - Raw V8 CPU profile types
6. `src/types/heap.rs` - Raw V8 heap profile types
7. `src/ir/mod.rs` - IR exports
8. `src/ir/frame.rs` - Frame IR (FrameKind, FrameCategory)
9. `src/ir/stack.rs` - Stack IR
10. `src/ir/sample.rs` - Sample IR

### Parsing & Classification (7 files)
11. `src/parser/mod.rs` - Parser exports
12. `src/parser/cpu_profile.rs` - CPU profile → IR
13. `src/parser/heap_profile.rs` - Heap profile → IR
14. `src/classify/mod.rs` - Classifier exports
15. `src/classify/frame_classifier.rs` - Frame categorization rules
16. `src/sourcemap/mod.rs` - Source map exports
17. `src/sourcemap/resolver.rs` - Location resolution + caching

### Analysis (4 files)
18. `src/analysis/mod.rs` - Analysis exports
19. `src/analysis/cpu_analysis.rs` - CPU hot path detection
20. `src/analysis/heap_analysis.rs` - Heap allocation analysis
21. `src/analysis/diff.rs` - Profile comparison
22. `src/analysis/caller_callee.rs` - Call graph attribution

### Output (6 files)
23. `src/output/mod.rs` - OutputFormat trait + registry
24. `src/output/markdown.rs` - Markdown formatter
25. `src/output/text.rs` - Plain text formatter
26. `src/output/json.rs` - JSON summary formatter
27. `src/output/speedscope.rs` - Speedscope formatter
28. `src/output/collapsed.rs` - Collapsed stacks formatter

**Total: 28 files**

## Source Map Support

### Resolution Strategy
1. Extract URL from profile (e.g., `dist/dist--xku3Vyh.js:141153`)
2. Normalize URL schemes (`file://`, `webpack://`, `vite://`, `node:`)
3. Check for adjacent `.map` file
4. Parse inline sourcemaps (base64 data URLs)
5. Handle missing column info (line-only fallback with warning)
6. Cache resolved maps by URL + mtime

### Output Integration
- Display original names when resolved
- Show minified mapping in collapsible section
- Group frames by original source file

## Verification

```bash
# 1. Basic parsing
cargo run -- cpu test-runner-profile/CPU.*.cpuprofile
cargo run -- heap test-runner-profile/Heap.*.heapprofile

# 2. Multiple outputs
cargo run -- cpu test-runner-profile/CPU.*.cpuprofile -f markdown -f speedscope -f json -o ./out/

# 3. Filtering
cargo run -- cpu test-runner-profile/CPU.*.cpuprofile --filter app --min-percent 1.0

# 4. Source maps
cargo run -- cpu test-runner-profile/CPU.*.cpuprofile \
  --sourcemap-dir /Users/qing/p/github/oxc_formatter/apps/oxfmt/dist/

# 5. Validate speedscope output at https://www.speedscope.app/
```

## Expected Markdown Output

```markdown
# CPU Profile Analysis

**File:** `CPU.20260124.005938.62230.0.001.cpuprofile`
**Duration:** 2731.38 ms | **Samples:** 2,198 | **Interval:** ~1.24 ms

## Time Breakdown by Category

| Category | Time | % |
|----------|------|---|
| App code | 1,150 ms | 42.1% |
| Dependencies (node_modules) | 846 ms | 31.0% |
| Node internals | 491 ms | 18.0% |
| V8/Native | 244 ms | 8.9% |

## Top Functions by Self Time

| # | Self Time | % | Total Time | Function | Location |
|---|-----------|---|------------|----------|----------|
| 1 | 450.23 ms | 16.5% | 1,234.56 ms | `getClassOrder` | `tailwind.ts:245:12` |
| 2 | 312.45 ms | 11.4% | 892.34 ms | `parseExpression` | `parser.ts:1892:5` |

<details>
<summary>Minified name mappings</summary>

| Original | Minified | Minified Location |
|----------|----------|-------------------|
| `getClassOrder` | `Bae` | `dist--xku3Vyh.js:141153` |

</details>

## Hot Paths

### Path #1 (12.3% of total, 336.12 ms)

```
sortTailwindClasses (prettier-plugin.ts:42)
└─ map (native)
   └─ processClass (tailwind.ts:189)
      └─ getClassOrder (tailwind.ts:245) ← HOTSPOT
```

## Caller/Callee Analysis for `getClassOrder`

**Top Callers:**
| Caller | Calls | Time Attributed |
|--------|-------|-----------------|
| `processClass` | 1,234 | 380 ms |
| `sortClasses` | 456 | 70 ms |

**Top Callees:**
| Callee | Self Time |
|--------|-----------|
| `compileCss` | 180 ms |
| `lookupRule` | 95 ms |

## Recommendations

### Critical (>20% impact)

- **`getClassOrder`** at `tailwind.ts:245:12`
  - 16.5% of total CPU time (450.23ms)
  - Called from `processClass` and `sortClasses`
  - **Suggestion:** Cache results by class name

### High (10-20% impact)

- **`parseExpression`** at `parser.ts:1892:5`
  - 11.4% of total CPU time (312.45ms)
  - **Suggestion:** Consider if re-parsing can be avoided

## Agent Summary

**Key Bottlenecks:**
1. `getClassOrder` (Tailwind CSS class ordering) - 16.5%
2. `parseExpression` (Babel parsing) - 11.4%

**Time Distribution:**
- 42% in app code - good target for optimization
- 31% in dependencies - consider upgrading or replacing slow deps
- 27% in internals/native - likely unavoidable

**Suggested Optimizations:**
- Cache `getClassOrder` results - classes repeat frequently
- Investigate if Babel parsing can be cached or avoided
```