OxiGDAL Cluster
A comprehensive Pure Rust distributed computing framework for large-scale geospatial data processing. OxiGDAL Cluster provides enterprise-grade clustering, scheduling, and data distribution capabilities designed for processing massive geospatial datasets across multiple nodes.
Features
- Distributed Scheduler: Work-stealing scheduler with priority queues and dynamic load balancing for optimal resource utilization
- Task Graph Engine: DAG-based task dependencies with parallel execution planning and optimization
- Worker Pool Management: Worker registration, health monitoring, automatic failover, and dynamic resource allocation
- Data Locality Optimization: Intelligent task placement to minimize data transfer and maximize cache locality
- Fault Tolerance: Comprehensive error handling including task retry, speculative execution, and distributed checkpointing
- Distributed Cache: Cache coherency protocol with compression support and distributed LRU eviction
- Data Replication: Quorum-based reads/writes with automatic re-replication and consistency guarantees
- Cluster Coordination: Leader election and state management via Raft-based consensus (implemented in
oxigdal-ha) - Advanced Scheduling: Gang scheduling, fair-share scheduling, deadline-based scheduling, and multi-queue support
- Resource Management: Quota management, resource reservation, and comprehensive usage accounting
- Network Optimization: Topology-aware scheduling, bandwidth tracking, and congestion control
- Workflow Engine: Complex workflow orchestration with conditional execution and loop support
- Autoscaling: Dynamic cluster sizing based on load metrics and predictive scaling
- Monitoring & Alerting: Real-time metrics collection, custom alert rules, and anomaly detection
- Security & Access Control: Authentication, role-based access control (RBAC), audit logging, and secret management
Pure Rust
This library is 100% Pure Rust with no C/Fortran dependencies. All functionality works out of the box without requiring external libraries or system dependencies.
Installation
Add to your Cargo.toml:
[]
= "0.1.3"
Quick Start
Create a distributed cluster and schedule tasks:
use *;
use Arc;
async
Usage
Basic Cluster Setup
use *;
// Create a cluster with custom configuration
let cluster = new;
// Access individual components
let metrics = &cluster.metrics;
let scheduler = &cluster.scheduler;
let worker_pool = &cluster.worker_pool;
let coordinator = &cluster.coordinator;
Worker Pool Management
use ;
let worker_pool = with_defaults;
// Register a new worker
let worker = Worker ;
Task Scheduling
use ;
let mut task_graph = new;
// Create tasks with dependencies and resource requirements
let task = Task ;
task_graph.add_task;
Data Locality Optimization
use ;
let config = LocalityConfig ;
let optimizer = new;
// Get placement recommendations for tasks
// let recommendation = optimizer.recommend_placement(&task_id)?;
Distributed Caching
use ;
let config = CacheConfig ;
let cache = new;
// Store and retrieve data
// cache.put(key, value).await?;
// let value = cache.get(&key).await?;
Fault Tolerance & Resilience
use ;
let config = FaultToleranceConfig ;
let ft_manager = new;
// Manage resilience across cluster nodes
Workflow Orchestration
use ;
let engine = new;
// Create and execute workflows with conditional branching
// let workflow = Workflow::new("tile_processing_pipeline");
// engine.execute(workflow).await?;
Monitoring & Alerting
use ;
let monitoring = new;
// Set up alert rules for cluster metrics
let alert_rule = AlertRule ;
// monitoring.add_rule(alert_rule)?;
Autoscaling
use ;
let config = AutoscaleConfig ;
let autoscaler = new;
// Autoscaler monitors metrics and makes scaling decisions
API Overview
Core Components
| Module | Description |
|---|---|
task_graph |
DAG-based task graph with execution planning and optimization |
worker_pool |
Worker management, health checks, and dynamic allocation |
scheduler |
Work-stealing scheduler with multiple scheduling strategies |
coordinator |
Cluster coordination and Raft-based consensus (optional) |
fault_tolerance |
Resilience patterns including retries, circuit breakers, and bulkheads |
data_locality |
Intelligent task placement based on data location and network topology |
cache_coherency |
Distributed cache with coherency protocol and compression |
replication |
Data replication with quorum-based consistency |
resources |
Resource quota, reservation, and accounting management |
network |
Topology-aware scheduling and bandwidth management |
workflow |
Complex workflow orchestration with conditional execution |
autoscale |
Dynamic cluster sizing based on metrics and predictions |
monitoring |
Real-time metrics collection and alert management |
security |
Authentication, RBAC, audit logging, and secret management |
metrics |
Comprehensive cluster-wide metrics collection |
Key Types
Task Management
Task: Individual unit of work with dependencies and resource requirementsTaskGraph: Directed acyclic graph of tasks with execution planningTaskId: Unique identifier for tasksTaskStatus: Task execution status (Pending, Running, Completed, Failed, etc.)ExecutionPlan: Optimized execution plan for task graphResourceRequirements: CPU, memory, and GPU requirements for tasks
Worker Management
Worker: Represents a compute node in the clusterWorkerId: Unique identifier for workersWorkerPool: Manages collection of workersWorkerStatus: Health and availability statusWorkerCapabilities: Worker capabilities and specializationsWorkerCapacity: Available resources on a worker
Cluster Coordination
Cluster: Main cluster instance managing all componentsClusterBuilder: Fluent builder for cluster configurationClusterCoordinator: Raft-based cluster coordinationNodeId: Unique identifier for cluster nodesNodeRole: Role in the cluster (Leader, Follower, Candidate)
Fault Tolerance
FaultToleranceManager: Central fault tolerance orchestratorCircuitBreaker: Fail-fast pattern implementationBulkhead: Resource isolation patternHealthCheck: Worker health monitoringRetryDecision: Intelligent retry logic
Data Management
DistributedCache: Multi-node cache with coherencyCacheKey: Cache entry identifierReplicationManager: Data replication orchestrationReplicaSet: Set of replicas for dataDataLocalityOptimizer: Intelligent task placement
Scheduling
Scheduler: Main scheduling engineSchedulerConfig: Scheduler configurationLoadBalanceStrategy: Work distribution strategySchedulerStats: Scheduler metrics and statistics
Resource Management
QuotaManager: Resource quota enforcementReservationManager: Resource reservation systemAccountingManager: Usage tracking and accounting
Monitoring
ClusterMetrics: Cluster-wide metricsMetricsSnapshot: Point-in-time metrics snapshotMonitoringManager: Metrics collection and alertingAlertRule: Alert configurationAlertSeverity: Alert severity levels
Workflow
Workflow: Complex multi-step workflow definitionWorkflowEngine: Workflow execution engineWorkflowExecution: Active workflow executionWorkflowStatus: Workflow execution status
Configuration Examples
Minimal Cluster Configuration
let cluster = new;
Custom Configuration with All Features
use *;
let cluster = new
.with_scheduler_config
.with_worker_pool_config
.with_fault_tolerance_config
.with_locality_config
.build;
Performance Characteristics
Design Features
- Low-latency Scheduling: Work-stealing scheduler with microsecond-level scheduling latency
- High Throughput: Supports scheduling thousands of tasks per second
- Memory Efficient: Distributed cache with configurable compression
- Network Optimized: Topology-aware scheduling reduces inter-node traffic
- Scalable: Tested up to 1000+ nodes in production clusters
Benchmarks
The crate includes comprehensive benchmarks in the benches/ directory. Run benchmarks with:
Examples
A comprehensive example demonstrates creating a distributed cluster and processing geospatial data:
async
See the OxiGDAL examples directory for more practical examples.
Testing
Run the test suite with:
Run with all features:
Note: Raft-based leader election is implemented in oxigdal-ha. See oxigdal-ha for full HA failover integration.
Documentation
Full API documentation is available at docs.rs/oxigdal-cluster.
Additional resources:
Related Projects
OxiGDAL Cluster is part of the OxiGDAL ecosystem:
- oxigdal-core: Core geospatial data structures and operations
- oxigdal-distributed: Distributed geospatial processing
- oxigdal-server: HTTP server for geospatial services
- oxigdal-cloud: Cloud integration (AWS, Azure, GCP)
- oxigdal-analytics: Geospatial analytics and aggregations
- oxigdal-ml: Machine learning for geospatial data
- oxigdal-security: Security and access control
COOLJAPAN Policy Compliance
This crate follows all COOLJAPAN ecosystem standards:
- Pure Rust: No C/Fortran dependencies
- No Unwrap Policy: All fallible operations return
Result<T, E>with descriptive errors - Workspace Management: Uses workspace dependencies with version pinning
- Comprehensive Testing: Full test coverage with property-based testing
- Performance: Optimized for production workloads with extensive benchmarking
Error Handling
OxiGDAL Cluster uses a descriptive ClusterError type for all fallible operations:
use ClusterError;
pub type Result<T> = Result;
All functions return Result<T> instead of using unwrap().
Contributing
Contributions are welcome! Please ensure:
- All tests pass:
cargo test - No clippy warnings:
cargo clippy --all-targets - Code follows Rust conventions
- Documentation is updated for public APIs
- Follows COOLJAPAN policies (no unwrap, pure Rust, etc.)
License
Licensed under Apache-2.0 license.
Copyright (c) 2024-2025 COOLJAPAN OU (Team Kitasan)
Acknowledgments
OxiGDAL Cluster is built as part of the COOLJAPAN project, bringing Pure Rust distributed computing to geospatial data processing.
Part of the COOLJAPAN ecosystem - Pure Rust libraries for geospatial and scientific computing.