scxtop 1.0.21

sched_ext scheduler tool for observability
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
# Wprof Perfetto Trace Compatibility Guide

## Overview

This guide explains how to use perfetto traces generated by
[wprof](https://github.com/facebookexperimental/wprof) with scxtop's perfetto
trace analyzers.

**Wprof** is a BPF-based profiler that captures kernel scheduler events and
outputs them in Perfetto format. Scxtop can parse and analyze these traces to
provide insights into scheduling behavior, task relationships, and sched_ext
metadata.

## Quick Start

### Generate a Compatible Trace

```bash
# Option 1: Use ftrace view mode (currently recommended)
cd /path/to/wprof
sudo ./wprof --emit-sched-view -o trace.proto

# Option 2: Default TrackEvent mode (requires scxtop Phase 9+)
sudo ./wprof -o trace.proto
```

### Analyze with Scxtop MCP

Load the trace and run analyzers:

```json
// 1. Load trace
{
  "tool": "load_perfetto_trace",
  "arguments": {
    "file_path": "/path/to/trace.proto",
    "trace_id": "my_trace"
  }
}

// 2. Get trace summary
{
  "tool": "get_trace_summary",
  "arguments": { "trace_id": "my_trace" }
}

// 3. Run all applicable analyzers
{
  "tool": "run_all_analyzers",
  "arguments": { "trace_id": "my_trace" }
}
```

---

## Wroprof Output Modes

Wprof can generate traces in two different formats. Understanding these modes
is crucial for compatibility with scxtop.

### Mode 1: TrackEvent Format (Default)

**Command**: `sudo ./wprof -o trace.proto`

**Description**:
- Uses Perfetto TrackEvent messages with custom categories
- Rich timeline view with detailed metadata
- **Includes sched_ext metadata** (layer_id, dsq_id)
- Includes perf counter deltas
- Includes compound delay tracking
- Includes stack traces as PerfSample packets

**Scxtop Status**: ⚠️ **Partial Support (Phase 9 in progress)**
- TrackEvent parsing infrastructure added in scxtop
- Full analyzer support coming soon
- Can access sched_ext metadata once parsing is complete

**Events Generated**:
- `ONCPU` slices - Task on-CPU periods
- `WAKER`/`WAKEE` instants - Wakeup relationships
- `PREEMPTOR`/`PREEMPTEE` instants - Preemption tracking
- `TIMER` events - Perf timer ticks
- `FORK`/`EXEC`/`EXIT`/`FREE` - Process lifecycle
- `HARDIRQ`/`SOFTIRQ`/`WQ` - Interrupt tracking
- `IPI` - Inter-processor interrupts
- `REQUEST` - USDT-based request tracking

### Mode 2: FtraceEvent Format (--emit-sched-view)

**Command**: `sudo ./wprof --emit-sched-view -o trace.proto`

**Description**:
- Uses standard Perfetto FtraceEvent bundles
- Compatible with Perfetto UI's ftrace view
- **Loses sched_ext metadata** (layer_id, dsq_id not in standard schema)
- Loses perf counter data
- Loses compound delay information

**Scxtop Status**:
- All core analyzers work
- CPU utilization, process runtime, wakeup latency, migrations
- Scheduling bottleneck detection
- Outlier detection
- Query framework

**Events Generated**:
- `sched_switch` - Context switch events
- `sched_waking` - Task waking events
- `sched_wakeup_new` - New task wakeups

---

## Compatibility Matrix

| Feature | TrackEvent Mode | FtraceEvent Mode (--emit-sched-view) | Scxtop Support |
|---------|----------------|--------------------------------------|----------------|
| **Scheduler Events** | | | |
| sched_switch | ✅ (as ONCPU slices) || ✅ FtraceEvent mode |
| sched_waking | ✅ (as WAKER/WAKEE) || ✅ FtraceEvent mode |
| sched_wakeup_new | ✅ (as WAKER/WAKEE) || ✅ FtraceEvent mode |
| **Metadata** | | | |
| Sched-ext layer_id | ✅ (in annotations) |||
| Sched-ext dsq_id | ✅ (in annotations) |||
| CPU ID ||||
| NUMA node |||  |
| **Performance Data** | | | |
| Perf counter deltas ||||
| Compound delay ||||
| Waking delay ||||
| Stack traces ||||
| **Analysis Capabilities** | | | |
| CPU utilization ||||
| Process runtime ||||
| Wakeup latency ||||
| Migration patterns ||||
| Task states ||||
| Preemption analysis ||||
| Scheduling bottlenecks ||||
| Outlier detection ||||
| DSQ analysis | ✅ (with metadata) | ⚠️ (limited) | ⏳ Phase 9 needed |

**Legend**: ✅ Supported | ❌ Not available | ⏳ Coming soon | ⚠️ Partial

---

## Current Recommendations

### For General Scheduling Analysis (Now)

Use **FtraceEvent mode** with `--emit-sched-view`:

```bash
sudo ./wprof --emit-sched-view -o trace.proto
```

**Why**: All scxtop analyzers currently work with FtraceEvent format. You get:
- CPU utilization analysis
- Process runtime statistics
- Wakeup latency measurements
- Migration pattern detection
- Scheduling bottleneck identification
- Statistical outlier detection

**Example Analysis Session**:

```json
// Load trace
{
  "tool": "load_perfetto_trace",
  "arguments": {
    "file_path": "/home/user/trace.proto",
    "trace_id": "workload"
  }
}

// Analyze CPU utilization
{
  "tool": "analyze_cpu_utilization",
  "arguments": {
    "trace_id": "workload",
    "use_parallel": true
  }
}

// Analyze process runtime (top 20)
{
  "tool": "analyze_process_runtime",
  "arguments": {
    "trace_id": "workload",
    "limit": 20
  }
}

// Find scheduling bottlenecks
{
  "tool": "find_scheduling_bottlenecks",
  "arguments": {
    "trace_id": "workload",
    "limit": 10
  }
}

// Detect outliers
{
  "tool": "detect_outliers",
  "arguments": {
    "trace_id": "workload",
    "method": "IQR",
    "category": "all"
  }
}
```

## Technical Details

### Idle Thread Handling

Wprof uses a non-standard PID mapping for idle threads:
- **Standard Perfetto**: swapper/N has TID 0
- **Wprof**: swapper/N has TID -(N+1)

**Impact**: Scxtop parser must handle negative TIDs as idle threads.

**Status**: ⏳ Will be handled in Phase 9 TrackEvent parser.

### Timestamp Format

Wprof timestamps are **relative to session start**:
- All timestamps: `event_time - session_start_time`
- Absolute timestamps not preserved

**Impact**: Cross-trace correlation not possible.

**Workaround**: Use trace-relative time for all analysis.

### Sched-ext Metadata Format

In TrackEvent mode, sched-ext metadata is in debug_annotations:

```
debug_annotations: [
  { name: "scx_layer_id", value: UintValue(3) },
  { name: "scx_dsq_id", value: UintValue(10) },
  { name: "cpu", value: UintValue(5) },
  { name: "numa_node", value: UintValue(0) },
  ...
]
```

**Access** (Phase 9+):

```rust
use scxtop::mcp::get_annotation_uint;

let layer_id = get_annotation_uint(&event.annotations, "scx_layer_id");
let dsq_id = get_annotation_uint(&event.annotations, "scx_dsq_id");
```

### Perf Counter Deltas

Perf counters are emitted as annotation deltas:

```
debug_annotations: [
  { name: "instructions", value: IntValue(1500000) },
  { name: "cycles", value: IntValue(3200000) },
  { name: "cache_misses", value: IntValue(450) },
  ...
]
```

**Interpretation**: Delta since last context switch for this task.

---

## Examples

### Example 1: Basic Workload Analysis

```bash
# Capture trace
sudo ./wprof --emit-sched-view -o workload.proto sleep 10

# Analyze with scxtop MCP
```

```json
{
  "tool": "load_perfetto_trace",
  "arguments": {
    "file_path": "/path/to/workload.proto",
    "trace_id": "workload"
  }
}

{
  "tool": "run_all_analyzers",
  "arguments": { "trace_id": "workload" }
}
```

**Output**: Comprehensive analysis across all 17 analyzers.

### Example 2: Finding High-Latency Tasks

```json
{
  "tool": "analyze_wakeup_latency",
  "arguments": { "trace_id": "workload" }
}

{
  "tool": "detect_outliers",
  "arguments": {
    "trace_id": "workload",
    "method": "IQR",
    "category": "latency"
  }
}
```

**Output**: Wakeup latency statistics + outlier tasks with high latency.

### Example 3: CPU Utilization Hotspots

```json
{
  "tool": "analyze_cpu_utilization",
  "arguments": {
    "trace_id": "workload",
    "use_parallel": true
  }
}

{
  "tool": "detect_outliers",
  "arguments": {
    "trace_id": "workload",
    "category": "cpu"
  }
}
```

**Output**: Per-CPU utilization + identification of over/underutilized CPUs.

### Example 4: Sched-ext Layer Analysis (Phase 9+)

```bash
# Capture with TrackEvent mode (preserves layer_id)
sudo ./wprof -o scx_trace.proto sleep 10
```

```json
{
  "tool": "load_perfetto_trace",
  "arguments": {
    "file_path": "/path/to/scx_trace.proto",
    "trace_id": "scx"
  }
}

{
  "tool": "analyze_oncpu_slices",
  "arguments": {
    "trace_id": "scx",
    "extract_scx_metadata": true
  }
}

{
  "tool": "analyze_scx_layer_behavior",
  "arguments": {
    "trace_id": "scx",
    "layer_id": 3
  }
}
```

**Output** (Phase 9+): Layer-specific runtime, migrations, and dispatch queue behavior.

---

## Troubleshooting

### Issue: "Trace not found" error

**Cause**: Trace not loaded or wrong trace_id.

**Solution**:
```json
{
  "tool": "load_perfetto_trace",
  "arguments": {
    "file_path": "/full/path/to/trace.proto",
    "trace_id": "my_trace"
  }
}
```

Use the exact `trace_id` returned from load in subsequent tools.

### Issue: "DSQ analysis shows no data"

**Cause**: Using `--emit-sched-view` mode loses sched_ext metadata.

**Solution**: For sched_ext analysis, use default TrackEvent mode (Phase 9+) or wait for full TrackEvent support.

### Issue: Analyzers report "No applicable events"

**Cause**: Trace may not contain required event types.

**Solution**:
```json
{
  "tool": "get_trace_summary",
  "arguments": { "trace_id": "my_trace" }
}
```

Check `capabilities.available_events` to see what event types are present.

---

## Performance

Based on testing with 900K+ event traces:

| Operation | Performance |
|-----------|-------------|
| Load trace | ~10s |
| Discover analyzers | < 1ms |
| Individual analyzer | 1-150ms |
| Batch analysis (7 analyzers) | ~33s |
| Outlier detection | ~1.3s |
| Query (100K events) | 1-15ms |

**Recommendations**:
- For interactive analysis: Use dedicated analyzer tools
- For batch processing: Use `run_all_analyzers`
- For large traces: Enable parallel processing where available

---

## Future Work

### Short Term

- Complete TrackEvent parsing
- ONCPU slice analyzer with sched_ext metadata
- Wakeup relationship analyzer using WAKER/WAKEE events
- Perf counter analysis
- Enhanced DSQ analyzer with layer information

### Medium Term

- Stack trace correlation and flame graphs
- Compound delay chain visualization
- Process lifecycle tracking
- IRQ/Softirq TrackEvent analysis
- REQUEST tracking for application-level analysis

### Long Term

- Real-time streaming analysis
- Trace comparison tools
- Advanced visualization support
- Custom analyzer plugins

---

## Resources

- **Scxtop Documentation**: See `PERFETTO_ANALYZER_GUIDE.md`
- **Wprof Repository**: https://github.com/facebookexperimental/wprof
- **Perfetto Documentation**: https://perfetto.dev/docs/
- **Sched-ext**: https://github.com/sched-ext/scx

---

## Contributing

Found a bug or have a feature request? Please file an issue on the scx repository.

Want to add a new analyzer? See the [Developer Guide](./PERFETTO_ANALYZER_GUIDE.md#developer-guide).