# Using scxtop MCP Server with Claude
## Configuration
### Claude Desktop
Add the scxtop MCP server to your Claude Desktop configuration file:
**macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json`
**Linux:** `~/.config/Claude/claude_desktop_config.json`
```json
{
"mcpServers": {
"scxtop": {
"command": "/path/to/scxtop",
"args": ["mcp", "--daemon"],
"env": {}
}
}
}
```
After adding this configuration, restart Claude Desktop.
### Claude Code (CLI)
Configure the MCP server for Claude Code:
**Configuration file:** `~/.config/claude/config.json`
```json
{
"mcpServers": {
"scxtop": {
"command": "sudo",
"args": ["/path/to/scxtop", "mcp", "--daemon"],
"env": {}
}
}
}
```
**Or use the CLI command:**
```bash
# Add the MCP server
claude mcp add scxtop --command "sudo /path/to/scxtop mcp --daemon"
# Verify it's configured
claude mcp list
# Test the connection
claude --mcp scxtop "Show me the current scheduler"
```
**Note:** Claude Code requires `sudo` since scxtop needs root privileges for BPF operations.
## Example Usage
### 1. System Overview
**You:** "Can you give me a summary of my system's scheduler?"
**Claude:** I'll use the `summarize_system` prompt to get a comprehensive overview.
*Claude invokes the prompt and follows the workflow, reading resources:*
- `topology://info` - Gets hardware layout
- `scheduler://current` - Identifies active scheduler
- `stats://aggregated/cpu` - CPU utilization
- `stats://aggregated/process` - Top processes
**Response:**
```
System Summary:
- Hardware: 16 CPUs (8 cores), 2 LLCs, 1 NUMA node, SMT enabled
- Active Scheduler: scx_rusty (sched_ext)
- CPU Utilization: Average 45%, range 32-67%
- Top Process: chrome (PID 1234) using 18% CPU time
- Context Switch Rate: 12,450/sec
- Load: Well balanced across CPUs (variance: 8%)
```
### 2. Latency Investigation
**You:** "Process 5678 is experiencing high scheduling latency. Can you investigate?"
**Claude:** I'll use the `debug_high_latency` prompt for PID 5678.
*Claude follows the debugging workflow:*
1. Reads `scheduler://current` to confirm sched_ext is active
2. Queries `stats://aggregated/process` filtered for PID 5678
3. Checks `stats://aggregated/cpu` for bottleneck CPUs
4. Uses `get_topology` tool to understand CPU layout
**Response:**
```
Latency Analysis for PID 5678:
Problem Found:
- Average scheduling latency: 2.3ms (high!)
- Process primarily runs on CPUs 4-7 (LLC 0)
- LLC 0 is heavily loaded (85% utilization vs 45% on LLC 1)
- Migration count: 47 migrations in last minute (excessive)
Root Cause:
The process is experiencing contention on LLC 0. Other CPU-intensive
processes on the same cache domain are causing:
- Long runqueue wait times
- Frequent migrations when seeking idle CPUs
- Cache thrashing from cross-LLC migrations
Recommendations:
1. Consider pinning task to specific CPUs in less loaded LLC 1
2. Adjust scheduler layer weights if using scx_layered
3. Check if workload can be split across cache domains
```
### 3. Real-time Monitoring
**You:** "Monitor scheduler events in real-time and alert me if you see any issues"
**Claude:** I'll subscribe to the event stream to watch for anomalies.
*Claude subscribes to `events://stream` resource in daemon mode*
**Claude observes and reports:**
```
Real-time Monitoring Active...
[10:30:42] Normal: ~450 sched_switch events/sec, latencies < 100μs
[10:31:15] ⚠️ ALERT: Latency spike detected!
- CPU 5: DSQ latency jumped to 3.2ms
- Process: mysqld (PID 2345)
- 15 consecutive high-latency sched_switch events
- Appears to be blocking on lock contention
[10:31:22] Pattern identified:
- MySQL is migrating rapidly (8 migrations in 7 seconds)
- Moving between CPUs 4-7 on same LLC
- Suggest checking MySQL connection pool settings
[10:31:45] Latencies returning to normal (< 150μs)
```
### 4. Performance Tuning
**You:** "My workload is throughput-focused. How should I optimize?"
**Claude:** I'll use the `analyze_scheduler_performance` prompt with focus on throughput.
*Claude uses the tools to gather data:*
- `query_stats` to discover available metrics
- Reads `stats://aggregated/llc` for cache efficiency
- Reads `stats://scheduler/raw` for scheduler-specific config
- Uses `get_topology` with full detail for NUMA layout
**Response:**
```
Throughput Optimization Analysis:
Current State:
- Context switch rate: 8,200/sec (moderate)
- CPU utilization: 78% average, but uneven (60-95% range)
- LLC hit rate: Good on LLC 0 (92%), poor on LLC 1 (76%)
- Migration rate: 120/sec (high for throughput workload)
Optimizations Recommended:
1. Reduce Migration Rate:
- Current scheduler (scx_rusty) is optimizing for latency
- Consider tuning migration threshold parameters
- Increase task_load_weight to make tasks "stickier"
2. Improve Cache Locality:
- LLC 1 shows poor cache efficiency
- Use topology to identify which processes run there
- Consider CPU pinning for cache-intensive workloads
3. Balance CPU Load:
- CPUs 8-11 are underutilized (60% vs 95% on CPUs 0-3)
- May indicate poor load balancing or affinity constraints
- Check for IRQ affinity issues with `list_events` tool
Expected Gains:
- Reducing migrations should improve throughput by 5-10%
- Better cache locality could improve by another 8-12%
- Balancing load prevents hotspot bottlenecks
```
### 5. Comparative Analysis
**You:** "Compare CPU usage across NUMA nodes"
**Claude:** Let me analyze the NUMA distribution.
*Claude queries:*
- `stats://aggregated/node` for per-node metrics
- `get_topology` for NUMA topology details
- `stats://aggregated/process` to see process placement
**Response:**
```
NUMA Node Analysis:
Node 0:
- CPUs: 0-7 (8 CPUs)
- Utilization: 82% average
- Process count: 47 processes
- Memory locality: 94% local accesses
- Top consumers: postgres (28%), nginx (15%)
Node 1:
- CPUs: 8-15 (8 CPUs)
- Utilization: 38% average
- Process count: 23 processes
- Memory locality: 87% local accesses
- Top consumers: backup-daemon (12%), monitoring (8%)
Imbalance Detected:
- Node 0 is heavily loaded while Node 1 is underutilized
- 2.2x difference in utilization
- No apparent affinity constraints preventing better balance
Recommendations:
- Check if scheduler is NUMA-aware (should prefer local memory)
- Consider manually distributing some Node 0 workloads
- Investigate why backup-daemon isn't utilizing available capacity
```
### 6. Stack Trace Profiling
**You:** "Profile the system for 10 seconds and show me the hottest functions"
**Claude:** I'll start perf profiling to collect stack traces.
*Claude uses the tools:*
- `start_perf_profiling` with freq=99, duration_secs=10
- Waits for collection to complete
- Uses `get_perf_results` to retrieve symbolized stack traces
**Response:**
```
Perf Profiling Results (99 Hz for 10 seconds):
Top Functions by CPU Time:
1. schedule (18.3%, kernel)
- Core scheduler function
- Stack trace shows calls from do_nanosleep, futex_wait
- Normal overhead for workload with many blocking syscalls
2. copy_user_enhanced_fast_string (12.1%, kernel)
- Memory copy operations
- Heavy userspace ↔ kernel data transfers
- Top callers: sys_read (45%), sys_write (38%)
3. mysql_execute_command (8.7%, mysqld)
- User: /usr/bin/mysqld
- File: sql/sql_parse.cc:4523
- Most time in query parsing and execution
4. __GI___poll (5.2%, libc)
- Blocking I/O operations
- Stack: nginx worker → epoll_wait → poll
- Shows nginx is I/O bound, not CPU bound
5. scx_bpf_dispatch (3.8%, [scx_rusty])
- Scheduler BPF program overhead
- P50: 245ns, P99: 1.2μs per call
- Call rate: 8,200 calls/sec
Analysis:
- System is I/O and synchronization bound (schedule + poll = 23.5%)
- High memory copy overhead suggests optimization opportunity
- MySQL query parsing is consuming significant CPU
- Scheduler overhead is reasonable (3.8%)
Recommendations:
1. Investigate copy_user patterns - may benefit from batching
2. MySQL: Consider prepared statements to reduce parsing
3. No abnormal kernel spinlock or contention detected
4. Scheduler overhead is healthy for this workload
```
**You:** "Show me the kernel stack trace for the schedule function"
**Claude:** Here's the detailed stack trace:
```
schedule (kernel)
├─ do_nanosleep (42% of samples)
│ ├─ hrtimer_nanosleep
│ ├─ common_nsleep
│ └─ __x64_sys_nanosleep
│ └─ do_syscall_64
│ └─ entry_SYSCALL_64_after_hwframe
│
├─ futex_wait_queue (31% of samples)
│ ├─ futex_wait
│ ├─ do_futex
│ └─ __x64_sys_futex
│ └─ do_syscall_64
│ └─ entry_SYSCALL_64_after_hwframe
│
└─ io_schedule (27% of samples)
├─ wait_on_page_bit
├─ filemap_fault
└─ __do_fault
└─ do_fault
└─ handle_pf_instruction
└─ exc_page_fault
Most common paths show normal blocking patterns - no issues detected.
```
## Available Prompts
You can ask Claude to:
- **"Analyze my scheduler's performance"** → General performance analysis
- **"Debug high latency for process X"** → Latency investigation
- **"Analyze CPU load imbalance"** → Load balancing analysis
- **"Investigate scheduler behavior"** → Deep dive into scheduling decisions
- **"Summarize my system"** → Quick system overview
- **"Profile the system and identify bottlenecks"** → Stack trace profiling
## Profiling Events
The MCP server exposes available perf events and kprobe functions:
**Query available profiling events:**
```
"What perf events are available on this system?"
→ Claude reads: events://perf
"Show me available kprobe functions for profiling"
→ Claude reads: events://kprobe
"What profiling events would you recommend for analyzing scheduler overhead?"
→ Claude combines events://perf and events://kprobe data with recommendations
"Show me all loaded BPF programs and their overhead"
→ Claude reads: bpf://programs
→ Returns program IDs, names, types, runtime, and execution counts
```
## Advanced Queries
**Query specific statistics:**
```
"Show me per-CPU dispatch queue latencies"
→ Claude reads: stats://aggregated/dsq
"What's the current scheduler and its configuration?"
→ Claude reads: scheduler://current and stats://scheduler/raw
"List all scheduler-related perf events"
→ Claude calls: list_events tool with subsystem="sched"
```
**Combine multiple data sources:**
```
"Find processes with high scheduling latency and correlate with CPU placement"
→ Claude reads: stats://aggregated/process, stats://aggregated/cpu
→ Claude calls: get_topology to understand CPU relationships
```
**Time-based analysis (daemon mode):**
```
"Monitor for the next 60 seconds and tell me if you see any anomalies"
→ Claude subscribes: events://stream
→ Processes real-time sched_switch, wakeup, migration events
→ Reports patterns and outliers
```
## Benefits
1. **Natural Language Interface**: Ask questions in plain English
2. **Guided Workflows**: Claude follows structured analysis patterns
3. **Contextual Understanding**: Claude correlates multiple data sources
4. **Proactive Analysis**: In daemon mode, Claude can spot issues you didn't ask about
5. **Actionable Recommendations**: Get specific tuning suggestions based on your workload
## Claude Code Command Line Examples
Claude Code provides a CLI for direct interaction with the MCP server:
**Quick queries:**
```bash
# System overview
claude --mcp scxtop "Summarize my system's scheduler and hardware"
# Check specific process
claude --mcp scxtop "Analyze scheduling behavior for process 1234"
# Performance analysis
claude --mcp scxtop "Are there any CPU load imbalances?"
# Latency investigation
claude --mcp scxtop "Debug high scheduling latency issues"
```
**Interactive session:**
```bash
# Start an interactive session with scxtop MCP available
claude --mcp scxtop
# Now you can ask questions naturally:
> What scheduler is currently running?
> Show me per-CPU utilization
> Which processes have the highest scheduling latency?
> Compare load across NUMA nodes
```
**Using prompts directly:**
```bash
# Invoke a specific workflow prompt
claude --mcp scxtop --prompt summarize_system
# Prompt with arguments
claude --mcp scxtop --prompt debug_high_latency --arg pid=5678
claude --mcp scxtop --prompt analyze_scheduler_performance --arg focus_area=latency
```
**Combining with code tasks:**
```bash
# Analyze scheduler and suggest kernel config changes
claude --mcp scxtop "Analyze my scheduler performance and suggest \
appropriate kernel configuration parameters for my workload"
# Generate a report
claude --mcp scxtop "Create a markdown report of my system's scheduler \
performance including hardware topology, current utilization, and \
recommendations" > scheduler-report.md
```
## Requirements
- scxtop built with MCP support
- Claude Desktop or Claude Code CLI
- Scheduler running (ideally sched_ext for full features)
- Root/sudo access for scxtop BPF operations