OxiGDAL Distributed
Distributed processing capabilities for large-scale geospatial workflows using Apache Arrow Flight.
Features
- Apache Arrow Flight RPC: Zero-copy data transfer between nodes using gRPC
- Multi-node Processing: Distribute workloads across multiple worker nodes
- Fault Tolerance: Automatic task retry and failure recovery
- Dynamic Scaling: Add or remove workers at runtime
- Progress Monitoring: Real-time tracking of distributed execution
- Resource Management: Memory and CPU limits per worker
- Data Partitioning: Multiple strategies (spatial tiles, strips, hash, range, load-balanced)
- Shuffle Operations: Efficient data redistribution for group-by, sort, and joins
- Pure Rust: No C/C++ dependencies
Architecture
┌─────────────┐
│ Coordinator │ ──── Schedules tasks and manages workers
└──────┬──────┘
│
┌────┴────┐
│ Flight │
│ Server │ ──── Zero-copy data transfer
└────┬────┘
│
┌────┴────────────────┐
│ │
┌─▼──────┐ ┌───▼─────┐
│ Worker │ │ Worker │ ──── Execute tasks
│ Node 1 │ │ Node 2 │
└────────┘ └─────────┘
Example: Distributed NDVI Calculation
use *;
async
Example: Worker Node
use *;
async
Example: Data Shuffle
use *;
use ;
use ;
use RecordBatch;
use Arc;
Partitioning Strategies
Tile Partitioning
Divide spatial data into regular tiles:
let extent = new?;
let partitioner = new?; // 4x4 grid
let partitions = partitioner.partition;
Strip Partitioning
Divide data into horizontal strips:
let extent = new?;
let partitioner = new?; // 8 strips
let partitions = partitioner.partition;
Hash Partitioning
Distribute data based on hash of a key:
let partitioner = new?; // 16 partitions
let partition_id = partitioner.partition_for_key;
Range Partitioning
Partition based on value ranges:
let boundaries = vec!;
let partitioner = new?;
let partition_id = partitioner.partition_for_value;
Load Balanced Partitioning
Balance load based on data size:
let total_size = 1000 * 1024 * 1024; // 1 GB
let num_workers = 8;
let partitioner = new?;
Shuffle Operations
Hash Shuffle
Group data by key:
let shuffle = new?;
let partitions = shuffle.shuffle?;
Range Shuffle
Sort data by value ranges:
let boundaries = vec!;
let shuffle = new?;
let partitions = shuffle.shuffle?;
Broadcast Shuffle
Replicate data to all partitions:
let shuffle = new?;
let partitions = shuffle.shuffle;
Performance
Benchmarks show excellent performance for large-scale operations:
- Tile Partitioning: 16x16 grid in <1ms
- Hash Shuffle: 100K rows in ~50ms
- Range Shuffle: 100K rows in ~60ms
- Task Scheduling: 10K tasks in ~100ms
Run benchmarks:
Safety
This crate follows OxiGDAL's strict safety policies:
- No
unwrap()usage - No
panic!()calls - Comprehensive error handling
- Pure Rust implementation
License
Apache-2.0
Authors
COOLJAPAN OU (Team Kitasan)