oxigdal-observability 0.1.3

OpenTelemetry-based observability, monitoring, and alerting for OxiGDAL
Documentation

oxigdal-observability

OpenTelemetry-based observability, monitoring, and alerting for OxiGDAL.

Features

  • OpenTelemetry Integration: Full support for distributed tracing, metrics, and structured logging
  • Custom Geospatial Metrics: Specialized metrics for raster, vector, I/O, cache, query, GPU, and cluster operations
  • Distributed Tracing: W3C Trace Context propagation across distributed services
  • Real-time Dashboards: Pre-built Grafana dashboards and Prometheus recording rules
  • Anomaly Detection: Statistical (Z-score, IQR) and ML-based (Isolation Forest, Autoencoder) anomaly detection
  • SLO/SLA Monitoring: Service level objectives with error budget tracking and burn rate calculation
  • Alert Management: Rule-based alerting with routing, deduplication, and escalation policies
  • Metric Exporters: Support for Prometheus, StatsD, InfluxDB, and AWS CloudWatch

Installation

Add to your Cargo.toml:

[dependencies]
oxigdal-observability = "0.1.3"

Quick Start

Initialize Telemetry

use oxigdal_observability::telemetry::{TelemetryConfig, init_with_config};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = TelemetryConfig::new("oxigdal")
        .with_service_version("1.0.0")
        .with_jaeger_endpoint("localhost:6831")
        .with_prometheus_endpoint("localhost:9090")
        .with_sampling_rate(0.1);
    
    let provider = init_with_config(config).await?;
    
    // Your application logic
    
    provider.shutdown().await?;
    Ok(())
}

Collect Custom Metrics

use opentelemetry::global;
use oxigdal_observability::metrics::GeoMetrics;

let meter = global::meter("oxigdal");
let metrics = GeoMetrics::new(meter)?;

// Record raster operations
metrics.raster.record_read(125.5, 1048576, "GeoTIFF", true);
metrics.raster.record_write(89.3, 524288, "GeoTIFF", true);

// Record cache operations
metrics.cache.record_hit("tile_cache", 8192);
metrics.cache.record_miss("tile_cache");

// Record query operations
metrics.query.record_query(45.2, "spatial", 150, true);

Set Up SLO Monitoring

use oxigdal_observability::slo::{
    budgets::BudgetTracker,
    objectives::{AvailabilitySlo, LatencySlo},
    SloMonitor,
};

let mut monitor = SloMonitor::new();
monitor.add_slo(AvailabilitySlo::three_nines());
monitor.add_slo(LatencySlo::p95_100ms());

let tracker = BudgetTracker::new();
for slo in monitor.slos() {
    tracker.track(slo.name.clone(), slo.error_budget.clone());
}

Anomaly Detection

use oxigdal_observability::anomaly::{
    statistical::ZScoreDetector,
    AnomalyDetector,
    DataPoint,
};
use chrono::Utc;

let mut detector = ZScoreDetector::new(3.0);

// Establish baseline
let baseline = vec![/* your historical data */];
detector.update_baseline(&baseline)?;

// Detect anomalies
let test_data = vec![DataPoint::new(Utc::now(), 50.0)];
let anomalies = detector.detect(&test_data)?;

Alert Management

use oxigdal_observability::alerting::{
    Alert, AlertManager, AlertSeverity,
    rules::AlertRule,
    routing::{Destination, Route},
};

let mut manager = AlertManager::new();

// Add alert rule
let rule = AlertRule {
    name: "high_error_rate".to_string(),
    condition: Arc::new(|| check_error_rate() > 0.05),
    severity: AlertSeverity::High,
    message: "Error rate exceeded 5%".to_string(),
};
manager.add_rule(rule);

// Add routing
let route = Route {
    matcher: Box::new(|alert| alert.severity == AlertSeverity::High),
    destinations: vec![
        Destination::Slack {
            webhook_url: "https://hooks.slack.com/...".to_string(),
        },
    ],
};
manager.add_route(route);

// Evaluate and route alerts
let alerts = manager.evaluate_rules().await?;

Performance

  • Telemetry Overhead: < 1% in production configurations
  • Metric Collection: < 100μs per metric operation
  • Trace Sampling: Configurable from 0.01% to 100%
  • Anomaly Detection: Real-time processing with minimal latency

Architecture

Telemetry Stack

┌─────────────────────────────────────────┐
│         Application Code                │
└─────────────┬───────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────┐
│    oxigdal-observability                │
│  ┌──────────┐  ┌──────────┐            │
│  │ Metrics  │  │  Traces  │            │
│  └────┬─────┘  └────┬─────┘            │
│       │             │                   │
│  ┌────▼─────────────▼─────┐            │
│  │   OpenTelemetry SDK    │            │
│  └────┬───────────────────┘            │
└───────┼────────────────────────────────┘
        │
        ▼
┌───────────────────────────────────────┐
│  Exporters (Jaeger, Prometheus, etc) │
└───────────────────────────────────────┘

Metrics Categories

  • Raster Metrics: Read/write operations, compression, tiles, overviews
  • Vector Metrics: Feature operations, spatial queries, geometryprocessing
  • I/O Metrics: File/network I/O, cloud storage, throughput, latency
  • Cache Metrics: Hit/miss rates, evictions, prefetch efficiency
  • Query Metrics: Query execution, planning, complexity, results
  • GPU Metrics: Utilization, memory, kernel execution, transfers
  • Cluster Metrics: Node health, data distribution, consensus, replication

Examples

See the examples/ directory for complete working examples:

  • basic_telemetry.rs: Initialize telemetry with OpenTelemetry
  • metrics_collection.rs: Collect custom geospatial metrics
  • slo_monitoring.rs: Set up SLO monitoring with error budgets
  • anomaly_detection.rs: Detect anomalies in metric streams
  • alerting.rs: Configure alerts with routing and escalation

License

Licensed under the Apache License, Version 2.0.

Contributing

Contributions are welcome! Please see the main OxiGDAL repository for contribution guidelines.