absurder-sql 0.1.23

# AbsurderSQL Production Monitoring Stack

Complete observability setup for production deployments using AbsurderSQL with `--features telemetry`.

## Overview

The monitoring stack provides:
- **4 Grafana Dashboards** with 28 panels for real-time visibility
- **18 Alert Rules** with severity-based routing
- **26 Recording Rules** for pre-aggregated metrics
- **Complete Runbooks** for every alert type
- **Browser DevTools Extension** for WASM telemetry debugging

## Quick Start

### 1. Enable Telemetry

Build AbsurderSQL with telemetry support:

```bash
# For native applications
cargo build --features telemetry

# For WASM/browser
wasm-pack build --target web --features telemetry
```

### 2. Expose Metrics Endpoint

Add a `/metrics` endpoint to your application. See examples in the main README for axum and actix-web.

### 3. Configure Prometheus

Add this job to your `prometheus.yml`:

```yaml
scrape_configs:
  - job_name: 'absurdersql'
    static_configs:
      - targets: ['localhost:9090']
    scrape_interval: 15s
```

### 4. Import Grafana Dashboards

Import the JSON files from `grafana/` into your Grafana instance:
- `query_performance.json` - Query metrics and slow query detection
- `storage_operations.json` - Block I/O and cache performance
- `system_health.json` - Error rates and system status
- `multi_tab_coordination.json` - Multi-tab sync debugging

### 5. Load Alert Rules

Add `prometheus/alert_rules.yml` to your Prometheus configuration:

```yaml
rule_files:
  - /path/to/absurdersql/monitoring/prometheus/alert_rules.yml
```

### 6. Install Browser DevTools Extension (Optional)

For WASM/browser deployments:
1. Open Chrome: `chrome://extensions/`
2. Enable "Developer mode"
3. Click "Load unpacked"
4. Select `../browser-extension/` directory

See `../browser-extension/INSTALLATION.md` for complete instructions.

## Dashboards

### Query Performance Dashboard
Monitors SQL query execution and identifies performance bottlenecks:
- Query latency percentiles (p50, p90, p99)
- Query rate and error rate
- Slow query detection (>100ms)
- Query type breakdown (SELECT/INSERT/UPDATE/DELETE)

### Storage Operations Dashboard
Tracks block storage performance and cache efficiency:
- Block read/write rates
- Cache hit ratio and effectiveness
- Storage layer latency
- Block allocation metrics

### System Health Dashboard
Overall system health and error tracking:
- Total error rate by type
- Transaction success rate
- Active connections
- Resource utilization

### Multi-Tab Coordination Dashboard
Debugging multi-tab synchronization:
- Leader election status
- Write queue depth
- Broadcast channel activity
- Sync operation latency

## Alert Rules

### Critical Alerts (Page Database Team)
- **HighErrorRate** - >5% errors over 5 minutes
- **ExtremeQueryLatency** - p99 query time >1 second
- **NoQueryThroughput** - Zero queries for 5 minutes (potential deadlock)
- **StorageFailures** - >3 storage failures per minute

### Warning Alerts (Notify Database Team)
- **ElevatedErrorRate** - >2% errors over 5 minutes
- **SlowQueryDetected** - Queries consistently >100ms
- **LowCacheHitRate** - Cache hit rate <60%
- **MultiTabSyncDelayed** - Sync operations >500ms

### Info Alerts (Log Only)
- **LeaderElected** - New tab became leader
- **FirstQueryExecuted** - Database initialization

See `RUNBOOK.md` for detailed remediation procedures for each alert.

## Recording Rules

Pre-aggregated metrics for faster dashboard queries:

- `absurdersql:query_rate` - Queries per second
- `absurdersql:error_rate` - Errors per second
- `absurdersql:error_ratio` - Error percentage
- `absurdersql:query_latency_p99` - 99th percentile latency
- `absurdersql:cache_hit_ratio` - Cache effectiveness
- And 21 more...

## Browser DevTools Extension

For debugging WASM telemetry in the browser:

**Features:**
- Real-time span list with search/filtering
- Export statistics visualization
- Buffer inspection
- Manual flush trigger
- OTLP endpoint configuration

**Installation:** See `../browser-extension/INSTALLATION.md`

## Architecture

```
Application Code
       ↓
  [AbsurderSQL]
       ↓
  Observability Layer (--features telemetry)
       ↓
   ┌───┴───┐
   ↓       ↓
Prometheus  OpenTelemetry
   ↓       ↓
Grafana  DevTools Extension
   ↓
 Alerts
   ↓
Runbook Procedures
```

## Files

```
monitoring/
├── README.md                      # This file
├── RUNBOOK.md                     # Alert remediation procedures
├── grafana/                       # Grafana dashboards
│   ├── query_performance.json     # 7 panels
│   ├── storage_operations.json    # 6 panels
│   ├── system_health.json         # 8 panels
│   └── multi_tab_coordination.json # 7 panels
└── prometheus/                    # Prometheus configuration
    └── alert_rules.yml            # 18 alerts + 26 recording rules
```

## Next Steps

1. **Test Setup** - Use `../examples/devtools_demo.html` to generate test telemetry
2. **Configure Alertmanager** - Set up alert routing and notification channels
3. **Tune Thresholds** - Adjust alert thresholds based on your workload
4. **Monitor Dashboard** - Regularly review dashboards for anomalies
5. **Follow Runbooks** - Use `RUNBOOK.md` when alerts fire

## Troubleshooting

### Metrics Not Appearing
- Verify `--features telemetry` was enabled during build
- Check `/metrics` endpoint is accessible
- Confirm Prometheus is scraping (check Prometheus UI targets page)

### Dashboards Showing No Data
- Verify Prometheus datasource is configured in Grafana
- Check time range selector (default: last 1 hour)
- Ensure application is generating queries

### Alerts Not Firing
- Check Prometheus Rules page to see if rules are loaded
- Verify alert expressions evaluate to true (test in Prometheus UI)
- Confirm Alertmanager is configured

### DevTools Extension Not Receiving Data
- Verify extension is loaded (check `chrome://extensions/`)
- Open browser console and look for `[Content]`, `[DevTools]`, `[Panel]` logs
- Ensure demo page is using `window.postMessage` to send telemetry

## Support

For issues or questions:
1. Check `RUNBOOK.md` for common problems
2. Review dashboard panels for debugging clues
3. Check Prometheus logs for scraping errors
4. Inspect DevTools extension console for message flow