Expand description
Distributed I/O processing
Provides infrastructure for distributed processing of large datasets:
- Distributed file reading with partitioning strategies
- Parallel writing with merge capabilities
- Distributed array operations
- Load balancing and fault tolerance
- Progress tracking for distributed operations Distributed I/O processing capabilities
This module provides infrastructure for distributed processing of large datasets across multiple nodes or processes, enabling scalable I/O operations for terabyte-scale data processing.
§Features
- Distributed file reading: Split large files across multiple workers
- Parallel writing: Coordinate writes from multiple processes
- Data partitioning: Automatic partitioning strategies for various formats
- Load balancing: Dynamic work distribution based on node capabilities
- Fault tolerance: Handle node failures and data recovery
- Progress tracking: Monitor distributed operations
§Examples
use scirs2_io::distributed::{DistributedReader, PartitionStrategy};
use scirs2_core::ndarray::Array2;
// Create a distributed reader for a large CSV file
let reader = DistributedReader::new("large_dataset.csv")
.partition_strategy(PartitionStrategy::RowBased { chunk_size: 1_000_000 })
.num_workers(4);
// Process chunks in parallel
let results: Vec<i32> = reader.process_parallel(|chunk| {
// Process each chunk (calculate some statistic from the bytes)
// This is a simplified example - real implementation would parse CSV data
let sum: u32 = chunk.iter().map(|&b| b as u32).sum();
Ok((sum / chunk.len() as u32) as i32) // Return average byte value
})?;Structs§
- Distributed
Array - Distributed array operations
- Distributed
Reader - Distributed reader for parallel file processing
- Distributed
Writer - Distributed writer for parallel file writing
- File
Metadata - File metadata
- Local
File System - Local file system implementation
- Worker
Info - Worker information
Enums§
- Distribution
- Distribution strategy for arrays
- Merge
Strategy - Strategy for merging distributed write outputs
- Partition
Strategy - Partition strategy for distributed processing
- Worker
Status - Worker status
Traits§
- Distributed
File System - Distributed file system abstraction