oxigdal-observability
OpenTelemetry-based observability, monitoring, and alerting for OxiGDAL.
Features
- OpenTelemetry Integration: Full support for distributed tracing, metrics, and structured logging
- Custom Geospatial Metrics: Specialized metrics for raster, vector, I/O, cache, query, GPU, and cluster operations
- Distributed Tracing: W3C Trace Context propagation across distributed services
- Real-time Dashboards: Pre-built Grafana dashboards and Prometheus recording rules
- Anomaly Detection: Statistical (Z-score, IQR) and ML-based (Isolation Forest, Autoencoder) anomaly detection
- SLO/SLA Monitoring: Service level objectives with error budget tracking and burn rate calculation
- Alert Management: Rule-based alerting with routing, deduplication, and escalation policies
- Metric Exporters: Support for Prometheus, StatsD, InfluxDB, and AWS CloudWatch
Installation
Add to your Cargo.toml:
[]
= "0.1.3"
Quick Start
Initialize Telemetry
use ;
async
Collect Custom Metrics
use global;
use GeoMetrics;
let meter = meter;
let metrics = new?;
// Record raster operations
metrics.raster.record_read;
metrics.raster.record_write;
// Record cache operations
metrics.cache.record_hit;
metrics.cache.record_miss;
// Record query operations
metrics.query.record_query;
Set Up SLO Monitoring
use ;
let mut monitor = new;
monitor.add_slo;
monitor.add_slo;
let tracker = new;
for slo in monitor.slos
Anomaly Detection
use ;
use Utc;
let mut detector = new;
// Establish baseline
let baseline = vec!;
detector.update_baseline?;
// Detect anomalies
let test_data = vec!;
let anomalies = detector.detect?;
Alert Management
use ;
let mut manager = new;
// Add alert rule
let rule = AlertRule ;
manager.add_rule;
// Add routing
let route = Route ;
manager.add_route;
// Evaluate and route alerts
let alerts = manager.evaluate_rules.await?;
Performance
- Telemetry Overhead: < 1% in production configurations
- Metric Collection: < 100μs per metric operation
- Trace Sampling: Configurable from 0.01% to 100%
- Anomaly Detection: Real-time processing with minimal latency
Architecture
Telemetry Stack
┌─────────────────────────────────────────┐
│ Application Code │
└─────────────┬───────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ oxigdal-observability │
│ ┌──────────┐ ┌──────────┐ │
│ │ Metrics │ │ Traces │ │
│ └────┬─────┘ └────┬─────┘ │
│ │ │ │
│ ┌────▼─────────────▼─────┐ │
│ │ OpenTelemetry SDK │ │
│ └────┬───────────────────┘ │
└───────┼────────────────────────────────┘
│
▼
┌───────────────────────────────────────┐
│ Exporters (Jaeger, Prometheus, etc) │
└───────────────────────────────────────┘
Metrics Categories
- Raster Metrics: Read/write operations, compression, tiles, overviews
- Vector Metrics: Feature operations, spatial queries, geometryprocessing
- I/O Metrics: File/network I/O, cloud storage, throughput, latency
- Cache Metrics: Hit/miss rates, evictions, prefetch efficiency
- Query Metrics: Query execution, planning, complexity, results
- GPU Metrics: Utilization, memory, kernel execution, transfers
- Cluster Metrics: Node health, data distribution, consensus, replication
Examples
See the examples/ directory for complete working examples:
basic_telemetry.rs: Initialize telemetry with OpenTelemetrymetrics_collection.rs: Collect custom geospatial metricsslo_monitoring.rs: Set up SLO monitoring with error budgetsanomaly_detection.rs: Detect anomalies in metric streamsalerting.rs: Configure alerts with routing and escalation
License
Licensed under the Apache License, Version 2.0.
Contributing
Contributions are welcome! Please see the main OxiGDAL repository for contribution guidelines.