mecha10-diagnostics
Topic-based diagnostics and performance monitoring service for Mecha10.
Overview
The diagnostics service provides comprehensive performance monitoring through the framework's pub/sub system. All diagnostics are published to topics under the /diagnostics namespace, allowing real-time monitoring via CLI, dashboard, or custom subscribers.
Features
- Topic-based architecture: Seamlessly integrates with Mecha10's pub/sub system
- Streaming pipeline metrics: Frame pipeline, latency, encoding performance, bandwidth
- WebRTC metrics: Connection stats, quality metrics (RTT, packet loss, jitter)
- WebSocket metrics: Connection tracking, message rates
- Redis metrics: Connection pool, operation latency and throughput
- Docker metrics: Container CPU, memory, network, and I/O stats
- System metrics: Host CPU, memory, disk, and network usage
- Low overhead: Atomic counters on hot paths, background aggregation
Diagnostic Topics
All diagnostics use the /diagnostics namespace:
/diagnostics/streaming/pipeline # Frame pipeline metrics
/diagnostics/streaming/latency # Latency measurements
/diagnostics/streaming/encoding # Encoding performance
/diagnostics/streaming/bandwidth # Bandwidth usage
/diagnostics/webrtc/connections # WebRTC peer connections
/diagnostics/webrtc/quality # RTT, packet loss, jitter
/diagnostics/websocket/connections # WebSocket connections
/diagnostics/websocket/messages # WebSocket message stats
/diagnostics/redis/pool # Redis connection pool
/diagnostics/redis/operations # Redis operation metrics
/diagnostics/docker/containers # Docker container stats
/diagnostics/godot/performance # Godot FPS, frame time, physics
/diagnostics/godot/scene # Godot node count, memory
/diagnostics/godot/connection # Godot WebSocket health
/diagnostics/system/resources # System-wide resources
/diagnostics/node/{id}/health # Per-node health
Usage
Publishing Streaming Diagnostics
use *;
use *;
// Create collector
let collector = new;
// Record metrics on hot path (minimal overhead)
collector.record_frame_received;
collector.record_frame_encoded; // 5ms encode time
collector.record_frame_sent; // 10KB frame
// Publish aggregated metrics periodically (every 1-5 seconds)
collector.publish_all.await?;
Subscribing to Diagnostics
use *;
use *;
// Subscribe to streaming pipeline metrics
let mut rx = ctx.subscribe.await?;
while let Some = rx.recv.await
Docker Metrics
// Create Docker collector
let docker = new.await;
// Collect metrics for all containers
docker.collect_all_containers.await?;
System Metrics
// Create system collector
let system = new;
// Collect and publish system metrics
system.collect_metrics.await?;
Godot Metrics
// Create Godot collector
let godot = new;
// Track connection events
godot.set_control_connected;
godot.set_camera_connected;
// Record messages
godot.record_control_message;
godot.record_camera_frame;
// Publish connection health
godot.publish_connection_metrics.await?;
// Publish performance metrics (data from Godot)
godot.publish_performance_metrics.await?;
CLI Integration
Monitor diagnostics in real-time:
# View all diagnostics
# Filter by category
# Live streaming metrics only
# Historical query
# Export to file
Dashboard Integration
The diagnostics service integrates with the dashboard via WebSocket:
- Real-time charts and visualizations
- Bottleneck detection
- Alerting on anomalies
- Historical analysis
Access at: http://localhost:3000/dashboard/diagnostics
Metric Types
Counter
Monotonically increasing value (e.g., frames received)
let counter = new;
counter.inc;
counter.add;
let total = counter.get;
Gauge
Current value that can go up or down (e.g., queue depth)
let gauge = new;
gauge.set;
gauge.inc;
gauge.dec;
Histogram
Distribution tracking with percentiles (e.g., encoding latency)
let mut hist = for_latency?;
hist.record; // Record 5ms
let p95 = hist.p95;
let p99 = hist.p99;
Performance
The diagnostics system is designed for minimal overhead:
- Atomic counters: Zero-cost increment on hot paths
- Background aggregation: Expensive operations (percentiles) run off hot path
- Rate limiting: Automatic 1-second intervals for publishing
- Zero-copy: Efficient Arc-based data sharing
Overhead on streaming pipeline: < 1% CPU, < 1MB memory
Architecture
┌─────────────────────────────────────────┐
│ Diagnostic Publishers │
│ (Simulation Bridge, Nodes, Services) │
└────────────────┬────────────────────────┘
│
▼
┌───────────────┐
│ Redis Pub/Sub │
└───────────────┘
│
┌────────┴────────┐
▼ ▼
┌────────┐ ┌──────────┐
│ CLI │ │Dashboard │
└────────┘ └──────────┘
│ │
▼ ▼
┌──────────────────────────────┐
│ Telemetry Service │
│ (Historical Archive) │
└──────────────────────────────┘
Future Enhancements
- Redis instrumentation: Add detailed operation tracking to mecha10-core::Context
- WebRTC stats: Extract metrics from WebRTC peer connections
- WebSocket tracking: Instrument WebSocket connections
- Custom metrics: Allow nodes to publish custom diagnostic metrics
- Alerting: Automatic alerts on threshold violations
- Profiling integration: Integration with performance profiling tools
License
MIT