Expand description
OxiGDAL Cluster - Distributed geospatial processing at scale.
This crate provides a comprehensive distributed computing framework for geospatial data processing, including:
- Distributed Scheduler: Work-stealing scheduler with priority queues and dynamic load balancing
- Task Graph Engine: DAG-based task dependencies with parallel execution planning
- Worker Pool Management: Worker registration, health monitoring, and automatic failover
- Data Locality Optimization: Minimize data transfer through intelligent task placement
- Fault Tolerance: Task retry, speculative execution, and checkpointing
- Distributed Cache: Cache coherency with compression and distributed LRU
- Data Replication: Quorum-based reads/writes with automatic re-replication
- Cluster Coordinator: Raft-based consensus and leader election
- Advanced Scheduling: Gang scheduling, fair-share, deadline-based, and multi-queue scheduling
- Resource Management: Quota management, resource reservation, and usage accounting
- Network Optimization: Topology-aware scheduling, bandwidth tracking, and congestion control
- Workflow Engine: Complex workflow orchestration with conditional execution and loops
- Autoscaling: Dynamic cluster sizing based on load metrics and predictive scaling
- Monitoring & Alerting: Real-time metrics, custom alerts, and anomaly detection
- Security & Access Control: Authentication, RBAC, audit logging, and secret management
§Examples
§Basic Cluster Setup
use oxigdal_cluster::{Cluster, ClusterBuilder, SchedulerConfig};
#[tokio::main]
async fn main() -> oxigdal_cluster::Result<()> {
// Create a cluster with default configuration
let cluster = Cluster::new();
// Start the cluster
cluster.start().await?;
// Get cluster statistics
let stats = cluster.get_statistics();
println!("Active workers: {}", stats.worker_pool.total_workers);
// Stop the cluster
cluster.stop().await?;
Ok(())
}§Custom Cluster Configuration
use oxigdal_cluster::{ClusterBuilder, SchedulerConfig, LoadBalanceStrategy};
use oxigdal_cluster::worker_pool::WorkerPoolConfig;
use std::time::Duration;
#[tokio::main]
async fn main() -> oxigdal_cluster::Result<()> {
// Configure the scheduler
let scheduler_config = SchedulerConfig {
max_queue_size: 10000,
work_steal_threshold: 10,
scheduling_interval: Duration::from_millis(100),
load_balance_strategy: LoadBalanceStrategy::LeastLoaded,
task_timeout: Duration::from_secs(300),
enable_work_stealing: true,
enable_backpressure: true,
max_concurrent_tasks_per_worker: 1000,
};
// Configure the worker pool
let worker_config = WorkerPoolConfig {
heartbeat_timeout: Duration::from_secs(30),
health_check_interval: Duration::from_secs(10),
max_unhealthy_duration: Duration::from_secs(120),
min_workers: 1,
max_workers: 100,
};
// Build the cluster with custom configuration
let cluster = ClusterBuilder::new()
.with_scheduler_config(scheduler_config)
.with_worker_pool_config(worker_config)
.build();
cluster.start().await?;
// Your cluster operations here
cluster.stop().await?;
Ok(())
}Re-exports§
pub use autoscale::AutoscaleConfig;pub use autoscale::AutoscaleStats;pub use autoscale::Autoscaler;pub use autoscale::MetricsSnapshot as AutoscaleMetrics;pub use autoscale::ScaleDecision;pub use cache_coherency::CacheConfig;pub use cache_coherency::CacheKey;pub use cache_coherency::DistributedCache;pub use coordinator::ClusterCoordinator;pub use coordinator::CoordinatorConfig;pub use coordinator::NodeId;pub use coordinator::NodeRole;pub use data_locality::DataLocalityOptimizer;pub use data_locality::LocalityConfig;pub use data_locality::PlacementRecommendation;pub use error::ClusterError;pub use error::Result;pub use fault_tolerance::Bulkhead;pub use fault_tolerance::BulkheadConfig;pub use fault_tolerance::BulkheadRegistry;pub use fault_tolerance::BulkheadStats;pub use fault_tolerance::CircuitBreaker;pub use fault_tolerance::CircuitBreakerConfig;pub use fault_tolerance::CircuitBreakerRegistry;pub use fault_tolerance::CircuitBreakerStats;pub use fault_tolerance::CircuitState;pub use fault_tolerance::Deadline;pub use fault_tolerance::DegradationConfig;pub use fault_tolerance::DegradationLevel;pub use fault_tolerance::DegradationManager;pub use fault_tolerance::DegradationStats;pub use fault_tolerance::FaultToleranceConfig;pub use fault_tolerance::FaultToleranceManager;pub use fault_tolerance::FaultToleranceStatistics;pub use fault_tolerance::HealthCheck;pub use fault_tolerance::HealthCheckConfig;pub use fault_tolerance::HealthCheckManager;pub use fault_tolerance::HealthCheckResult;pub use fault_tolerance::HealthCheckStats;pub use fault_tolerance::HealthStatus;pub use fault_tolerance::RequestPriority;pub use fault_tolerance::RetryDecision;pub use fault_tolerance::TimeoutBudget;pub use fault_tolerance::TimeoutConfig;pub use fault_tolerance::TimeoutManager;pub use fault_tolerance::TimeoutRegistry;pub use fault_tolerance::TimeoutStats;pub use metrics::ClusterMetrics;pub use metrics::MetricsSnapshot;pub use metrics::WorkerMetrics;pub use monitoring::Alert;pub use monitoring::AlertRule;pub use monitoring::AlertSeverity;pub use monitoring::MonitoringManager;pub use monitoring::MonitoringStats;pub use network::BandwidthTracker;pub use network::CompressionManager;pub use network::CongestionController;pub use network::TopologyManager;pub use replication::ReplicaSet;pub use replication::ReplicationConfig;pub use replication::ReplicationManager;pub use resources::AccountingManager;pub use resources::QuotaManager;pub use resources::ReservationManager;pub use scheduler::LoadBalanceStrategy;pub use scheduler::Scheduler;pub use scheduler::SchedulerConfig;pub use scheduler::SchedulerStats;pub use security::Role;pub use security::SecurityManager;pub use security::SecurityStats;pub use security::User;pub use task_graph::ExecutionPlan;pub use task_graph::ResourceRequirements;pub use task_graph::Task;pub use task_graph::TaskGraph;pub use task_graph::TaskId;pub use task_graph::TaskStatus;pub use worker_pool::SelectionStrategy;pub use worker_pool::Worker;pub use worker_pool::WorkerCapabilities;pub use worker_pool::WorkerCapacity;pub use worker_pool::WorkerId;pub use worker_pool::WorkerPool;pub use worker_pool::WorkerStatus;pub use worker_pool::WorkerUsage;pub use workflow::Workflow;pub use workflow::WorkflowEngine;pub use workflow::WorkflowExecution;pub use workflow::WorkflowStatus;
Modules§
- autoscale
- Cluster autoscaling for dynamic resource management.
- cache_
coherency - Distributed cache with coherency protocol.
- coordinator
- Cluster coordinator with leader election and membership management.
- data_
locality - Data locality optimization for minimizing data transfer.
- error
- Error types for the oxigdal-cluster crate.
- fault_
tolerance - Advanced fault tolerance system for handling failures and recovery.
- metrics
- Metrics collection and monitoring for the cluster.
- monitoring
- Advanced monitoring and alerting for cluster management.
- network
- Network optimization for distributed computing.
- replication
- Data replication for reliability and availability.
- resources
- Resource management modules for quota, reservation, and accounting.
- scheduler
- Distributed task scheduler modules.
- security
- Security and access control for cluster operations.
- task_
graph - Task graph engine for managing task dependencies and execution order.
- worker_
pool - Worker pool management for the cluster.
- workflow
- Workflow orchestration engine for complex distributed tasks.
Structs§
- Cluster
- Complete cluster instance.
- Cluster
Builder - Cluster builder for easy setup.
- Cluster
Statistics - Cluster-wide statistics.