# sync metrics proposal
## current
Sync activity is only visible via `writestead_mcp_tool_calls_by_tool_total{tool="wiki_sync"}` and `writestead_mcp_tool_errors_by_tool_total{tool="wiki_sync"}`. These count explicit MCP `wiki_sync` invocations.
Sync backend runs triggered by other paths are uncounted:
- background periodic sync (if any)
- sync-on-write triggered by `wiki_write` / `wiki_edit`
- CLI `writestead sync`
Effect: Grafana's "Sync Activity" panel stays flat unless someone manually calls the `wiki_sync` MCP tool — misleading, since syncs are happening on writes.
## proposed
Expose dedicated sync counters that cover every code path that runs the sync backend.
```
writestead_sync_runs_total{trigger="..."} # counter
writestead_sync_errors_total{trigger="..."} # counter
writestead_sync_duration_seconds_sum
writestead_sync_duration_seconds_count
```
Error label could carry `reason` too if useful (e.g., `auth`, `network`, `conflict`), but start without — the counter alone unblocks the dashboard.
## implementation sketch
1. Define the counters alongside existing raw/MCP counters in `server.rs` / wherever `McpState` lives.
2. In `syncer::sync_once` (or the single chokepoint that runs `ob sync`), take a `trigger: &'static str` arg, time the call, bump:
- `runs_total{trigger}` always
- `errors_total{trigger}` on error path
- `duration_seconds_{sum,count}` unconditionally
3. Every caller of `sync_once` passes its trigger label:
- MCP `wiki_sync` tool handler → `"mcp"`
- Post-write sync in `wiki_write` → `"write"`
- Post-edit sync in `wiki_edit` → `"edit"`
- CLI `writestead sync` → `"cli"`
- Any timer/background path → `"background"`
4. Expose the new series in `/metrics` (sorted label output, same style as `raw_reads_by_format`).
## dashboard follow-up
Grafana's Sync Activity panel switches to:
```
success/min: sum(rate(writestead_sync_runs_total[5m])) * 60
- sum(rate(writestead_sync_errors_total[5m])) * 60
errors/min: sum(rate(writestead_sync_errors_total[5m])) * 60
```
Optional: add per-trigger breakdown panel.
## notes
- Don't remove the existing `wiki_sync` MCP tool counter. It's still useful for "manual sync" observability.
- `sync_once` is already the natural chokepoint — no need to sprinkle metric bumps across callers, just thread the trigger label.
- This is pure observability work. No behavior change.
## priority
Low. Incident-safety already landed. Dashboard is slightly misleading, nothing is broken.