Expand description
Metrics for observability.
Exports Prometheus-compatible metrics for:
- Peer connection status
- Stream tailing performance
- Replication lag
- Deduplication stats
- Circuit breaker state
- Batch processing stats
§Metric Naming Convention
All metrics are prefixed with replication_ and follow Prometheus conventions:
- Counters end in
_total - Gauges represent current state
- Histograms track distributions (duration, size)
§Usage
use replication_engine::metrics;
use std::time::Duration;
// In hot_path after reading events
metrics::record_cdc_events_read("peer-1", 42);
// In batch processor after flush
metrics::record_batch_flush("peer-1", 100, 85, 5, 8, 2, Duration::from_millis(50));Functions§
- cursor_
retries_ total - Record cursor SQLite retry (for SQLITE_BUSY/SQLITE_LOCKED).
- record_
adaptive_ batch_ size - Record current adaptive batch size for a peer.
- record_
backpressure_ pause - Record backpressure pause (sync-engine under load).
- record_
batch_ dedup - Record batch dedup stats (for monitoring dedup efficiency).
- record_
batch_ flush - Record batch flush with detailed stats.
- record_
cdc_ events_ applied - Record CDC events applied (not deduplicated).
- record_
cdc_ events_ deduped - Record CDC events deduplicated (skipped).
- record_
cdc_ events_ read - Record CDC events read from a peer.
- record_
circuit_ call - Record circuit breaker call outcome.
- record_
circuit_ rejection - Record circuit breaker rejection (circuit was open).
- record_
cursor_ flush - Record cursor flush batch (debounced writes).
- record_
cursor_ persist - Record cursor persistence.
- record_
error - Record errors by type.
- record_
event_ processing_ latency - Record event processing latency.
- record_
merkle_ divergence - Record divergent peer detected during repair.
- record_
peer_ circuit_ state - Record peer circuit breaker state change.
- record_
peer_ connection - Record a peer connection event.
- record_
peer_ operation_ latency - Record peer Redis operation latency by operation type. Useful for tracking Merkle queries, item fetches, etc.
- record_
peer_ ping - Record peer ping result.
- record_
peer_ ping_ latency - Record peer ping latency (for idle peer health checks).
- record_
peer_ state - Record peer connection state.
- record_
repair_ cycle - Record cold path repair cycle.
- record_
repair_ cycle_ complete - Record cold path repair cycle completion.
- record_
repair_ skipped - Record cold path repair cycle skipped.
- record_
replication_ lag - Record replication lag (time since last successful sync).
- record_
replication_ lag_ events - Record replication lag in events (how many events behind stream head).
- record_
replication_ lag_ ms - Record replication lag in milliseconds (based on stream ID timestamps).
- record_
slo_ violation - Record an SLO violation (latency threshold exceeded).
- record_
stream_ read - Record stream read result.
- record_
stream_ read_ latency - Record stream read (XREAD) latency.
- record_
stream_ trimmed - Record stream trimmed event (potential data gap).
- set_
circuit_ state - Set circuit breaker state gauge (0=closed, 1=half_open, 2=open).
- set_
connected_ peers - Gauge for number of connected peers.
- set_
engine_ state - Gauge for engine state.
- set_
replication_ lag_ slo - Set current replication lag gauge (for SLO monitoring).