scxtop 1.0.21

sched_ext scheduler tool for observability
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
# Using scxtop MCP Server with Claude

## Configuration

### Claude Desktop

Add the scxtop MCP server to your Claude Desktop configuration file:

**macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json`
**Linux:** `~/.config/Claude/claude_desktop_config.json`

```json
{
  "mcpServers": {
    "scxtop": {
      "command": "/path/to/scxtop",
      "args": ["mcp", "--daemon"],
      "env": {}
    }
  }
}
```

After adding this configuration, restart Claude Desktop.

### Claude Code (CLI)

Configure the MCP server for Claude Code:

**Configuration file:** `~/.config/claude/config.json`

```json
{
  "mcpServers": {
    "scxtop": {
      "command": "sudo",
      "args": ["/path/to/scxtop", "mcp", "--daemon"],
      "env": {}
    }
  }
}
```

**Or use the CLI command:**

```bash
# Add the MCP server
claude mcp add scxtop --command "sudo /path/to/scxtop mcp --daemon"

# Verify it's configured
claude mcp list

# Test the connection
claude --mcp scxtop "Show me the current scheduler"
```

**Note:** Claude Code requires `sudo` since scxtop needs root privileges for BPF operations.

## Example Usage

### 1. System Overview

**You:** "Can you give me a summary of my system's scheduler?"

**Claude:** I'll use the `summarize_system` prompt to get a comprehensive overview.

*Claude invokes the prompt and follows the workflow, reading resources:*
- `topology://info` - Gets hardware layout
- `scheduler://current` - Identifies active scheduler
- `stats://aggregated/cpu` - CPU utilization
- `stats://aggregated/process` - Top processes

**Response:**
```
System Summary:
- Hardware: 16 CPUs (8 cores), 2 LLCs, 1 NUMA node, SMT enabled
- Active Scheduler: scx_rusty (sched_ext)
- CPU Utilization: Average 45%, range 32-67%
- Top Process: chrome (PID 1234) using 18% CPU time
- Context Switch Rate: 12,450/sec
- Load: Well balanced across CPUs (variance: 8%)
```

### 2. Latency Investigation

**You:** "Process 5678 is experiencing high scheduling latency. Can you investigate?"

**Claude:** I'll use the `debug_high_latency` prompt for PID 5678.

*Claude follows the debugging workflow:*
1. Reads `scheduler://current` to confirm sched_ext is active
2. Queries `stats://aggregated/process` filtered for PID 5678
3. Checks `stats://aggregated/cpu` for bottleneck CPUs
4. Uses `get_topology` tool to understand CPU layout

**Response:**
```
Latency Analysis for PID 5678:

Problem Found:
- Average scheduling latency: 2.3ms (high!)
- Process primarily runs on CPUs 4-7 (LLC 0)
- LLC 0 is heavily loaded (85% utilization vs 45% on LLC 1)
- Migration count: 47 migrations in last minute (excessive)

Root Cause:
The process is experiencing contention on LLC 0. Other CPU-intensive
processes on the same cache domain are causing:
- Long runqueue wait times
- Frequent migrations when seeking idle CPUs
- Cache thrashing from cross-LLC migrations

Recommendations:
1. Consider pinning task to specific CPUs in less loaded LLC 1
2. Adjust scheduler layer weights if using scx_layered
3. Check if workload can be split across cache domains
```

### 3. Real-time Monitoring

**You:** "Monitor scheduler events in real-time and alert me if you see any issues"

**Claude:** I'll subscribe to the event stream to watch for anomalies.

*Claude subscribes to `events://stream` resource in daemon mode*

**Claude observes and reports:**
```
Real-time Monitoring Active...

[10:30:42] Normal: ~450 sched_switch events/sec, latencies < 100μs

[10:31:15] ⚠️ ALERT: Latency spike detected!
- CPU 5: DSQ latency jumped to 3.2ms
- Process: mysqld (PID 2345)
- 15 consecutive high-latency sched_switch events
- Appears to be blocking on lock contention

[10:31:22] Pattern identified:
- MySQL is migrating rapidly (8 migrations in 7 seconds)
- Moving between CPUs 4-7 on same LLC
- Suggest checking MySQL connection pool settings

[10:31:45] Latencies returning to normal (< 150μs)
```

### 4. Performance Tuning

**You:** "My workload is throughput-focused. How should I optimize?"

**Claude:** I'll use the `analyze_scheduler_performance` prompt with focus on throughput.

*Claude uses the tools to gather data:*
- `query_stats` to discover available metrics
- Reads `stats://aggregated/llc` for cache efficiency
- Reads `stats://scheduler/raw` for scheduler-specific config
- Uses `get_topology` with full detail for NUMA layout

**Response:**
```
Throughput Optimization Analysis:

Current State:
- Context switch rate: 8,200/sec (moderate)
- CPU utilization: 78% average, but uneven (60-95% range)
- LLC hit rate: Good on LLC 0 (92%), poor on LLC 1 (76%)
- Migration rate: 120/sec (high for throughput workload)

Optimizations Recommended:

1. Reduce Migration Rate:
   - Current scheduler (scx_rusty) is optimizing for latency
   - Consider tuning migration threshold parameters
   - Increase task_load_weight to make tasks "stickier"

2. Improve Cache Locality:
   - LLC 1 shows poor cache efficiency
   - Use topology to identify which processes run there
   - Consider CPU pinning for cache-intensive workloads

3. Balance CPU Load:
   - CPUs 8-11 are underutilized (60% vs 95% on CPUs 0-3)
   - May indicate poor load balancing or affinity constraints
   - Check for IRQ affinity issues with `list_events` tool

Expected Gains:
- Reducing migrations should improve throughput by 5-10%
- Better cache locality could improve by another 8-12%
- Balancing load prevents hotspot bottlenecks
```

### 5. Comparative Analysis

**You:** "Compare CPU usage across NUMA nodes"

**Claude:** Let me analyze the NUMA distribution.

*Claude queries:*
- `stats://aggregated/node` for per-node metrics
- `get_topology` for NUMA topology details
- `stats://aggregated/process` to see process placement

**Response:**
```
NUMA Node Analysis:

Node 0:
- CPUs: 0-7 (8 CPUs)
- Utilization: 82% average
- Process count: 47 processes
- Memory locality: 94% local accesses
- Top consumers: postgres (28%), nginx (15%)

Node 1:
- CPUs: 8-15 (8 CPUs)
- Utilization: 38% average
- Process count: 23 processes
- Memory locality: 87% local accesses
- Top consumers: backup-daemon (12%), monitoring (8%)

Imbalance Detected:
- Node 0 is heavily loaded while Node 1 is underutilized
- 2.2x difference in utilization
- No apparent affinity constraints preventing better balance

Recommendations:
- Check if scheduler is NUMA-aware (should prefer local memory)
- Consider manually distributing some Node 0 workloads
- Investigate why backup-daemon isn't utilizing available capacity
```

### 6. Stack Trace Profiling

**You:** "Profile the system for 10 seconds and show me the hottest functions"

**Claude:** I'll start perf profiling to collect stack traces.

*Claude uses the tools:*
- `start_perf_profiling` with freq=99, duration_secs=10
- Waits for collection to complete
- Uses `get_perf_results` to retrieve symbolized stack traces

**Response:**
```
Perf Profiling Results (99 Hz for 10 seconds):

Top Functions by CPU Time:
1. schedule (18.3%, kernel)
   - Core scheduler function
   - Stack trace shows calls from do_nanosleep, futex_wait
   - Normal overhead for workload with many blocking syscalls

2. copy_user_enhanced_fast_string (12.1%, kernel)
   - Memory copy operations
   - Heavy userspace ↔ kernel data transfers
   - Top callers: sys_read (45%), sys_write (38%)

3. mysql_execute_command (8.7%, mysqld)
   - User: /usr/bin/mysqld
   - File: sql/sql_parse.cc:4523
   - Most time in query parsing and execution

4. __GI___poll (5.2%, libc)
   - Blocking I/O operations
   - Stack: nginx worker → epoll_wait → poll
   - Shows nginx is I/O bound, not CPU bound

5. scx_bpf_dispatch (3.8%, [scx_rusty])
   - Scheduler BPF program overhead
   - P50: 245ns, P99: 1.2μs per call
   - Call rate: 8,200 calls/sec

Analysis:
- System is I/O and synchronization bound (schedule + poll = 23.5%)
- High memory copy overhead suggests optimization opportunity
- MySQL query parsing is consuming significant CPU
- Scheduler overhead is reasonable (3.8%)

Recommendations:
1. Investigate copy_user patterns - may benefit from batching
2. MySQL: Consider prepared statements to reduce parsing
3. No abnormal kernel spinlock or contention detected
4. Scheduler overhead is healthy for this workload
```

**You:** "Show me the kernel stack trace for the schedule function"

**Claude:** Here's the detailed stack trace:
```
schedule (kernel)
├─ do_nanosleep (42% of samples)
│  ├─ hrtimer_nanosleep
│  ├─ common_nsleep
│  └─ __x64_sys_nanosleep
│     └─ do_syscall_64
│        └─ entry_SYSCALL_64_after_hwframe
│
├─ futex_wait_queue (31% of samples)
│  ├─ futex_wait
│  ├─ do_futex
│  └─ __x64_sys_futex
│     └─ do_syscall_64
│        └─ entry_SYSCALL_64_after_hwframe
│
└─ io_schedule (27% of samples)
   ├─ wait_on_page_bit
   ├─ filemap_fault
   └─ __do_fault
      └─ do_fault
         └─ handle_pf_instruction
            └─ exc_page_fault

Most common paths show normal blocking patterns - no issues detected.
```

## Available Prompts

You can ask Claude to:
- **"Analyze my scheduler's performance"** → General performance analysis
- **"Debug high latency for process X"** → Latency investigation
- **"Analyze CPU load imbalance"** → Load balancing analysis
- **"Investigate scheduler behavior"** → Deep dive into scheduling decisions
- **"Summarize my system"** → Quick system overview
- **"Profile the system and identify bottlenecks"** → Stack trace profiling

## Profiling Events

The MCP server exposes available perf events and kprobe functions:

**Query available profiling events:**
```
"What perf events are available on this system?"
→ Claude reads: events://perf

"Show me available kprobe functions for profiling"
→ Claude reads: events://kprobe

"What profiling events would you recommend for analyzing scheduler overhead?"
→ Claude combines events://perf and events://kprobe data with recommendations

"Show me all loaded BPF programs and their overhead"
→ Claude reads: bpf://programs
→ Returns program IDs, names, types, runtime, and execution counts
```

## Advanced Queries

**Query specific statistics:**
```
"Show me per-CPU dispatch queue latencies"
→ Claude reads: stats://aggregated/dsq

"What's the current scheduler and its configuration?"
→ Claude reads: scheduler://current and stats://scheduler/raw

"List all scheduler-related perf events"
→ Claude calls: list_events tool with subsystem="sched"
```

**Combine multiple data sources:**
```
"Find processes with high scheduling latency and correlate with CPU placement"
→ Claude reads: stats://aggregated/process, stats://aggregated/cpu
→ Claude calls: get_topology to understand CPU relationships
```

**Time-based analysis (daemon mode):**
```
"Monitor for the next 60 seconds and tell me if you see any anomalies"
→ Claude subscribes: events://stream
→ Processes real-time sched_switch, wakeup, migration events
→ Reports patterns and outliers
```

## Benefits

1. **Natural Language Interface**: Ask questions in plain English
2. **Guided Workflows**: Claude follows structured analysis patterns
3. **Contextual Understanding**: Claude correlates multiple data sources
4. **Proactive Analysis**: In daemon mode, Claude can spot issues you didn't ask about
5. **Actionable Recommendations**: Get specific tuning suggestions based on your workload

## Claude Code Command Line Examples

Claude Code provides a CLI for direct interaction with the MCP server:

**Quick queries:**
```bash
# System overview
claude --mcp scxtop "Summarize my system's scheduler and hardware"

# Check specific process
claude --mcp scxtop "Analyze scheduling behavior for process 1234"

# Performance analysis
claude --mcp scxtop "Are there any CPU load imbalances?"

# Latency investigation
claude --mcp scxtop "Debug high scheduling latency issues"
```

**Interactive session:**
```bash
# Start an interactive session with scxtop MCP available
claude --mcp scxtop

# Now you can ask questions naturally:
> What scheduler is currently running?
> Show me per-CPU utilization
> Which processes have the highest scheduling latency?
> Compare load across NUMA nodes
```

**Using prompts directly:**
```bash
# Invoke a specific workflow prompt
claude --mcp scxtop --prompt summarize_system

# Prompt with arguments
claude --mcp scxtop --prompt debug_high_latency --arg pid=5678
claude --mcp scxtop --prompt analyze_scheduler_performance --arg focus_area=latency
```

**Combining with code tasks:**
```bash
# Analyze scheduler and suggest kernel config changes
claude --mcp scxtop "Analyze my scheduler performance and suggest \
  appropriate kernel configuration parameters for my workload"

# Generate a report
claude --mcp scxtop "Create a markdown report of my system's scheduler \
  performance including hardware topology, current utilization, and \
  recommendations" > scheduler-report.md
```

## Requirements

- scxtop built with MCP support
- Claude Desktop or Claude Code CLI
- Scheduler running (ideally sched_ext for full features)
- Root/sudo access for scxtop BPF operations