Module distributed

Module distributed 

Source
Expand description

Distributed I/O processing

Provides infrastructure for distributed processing of large datasets:

  • Distributed file reading with partitioning strategies
  • Parallel writing with merge capabilities
  • Distributed array operations
  • Load balancing and fault tolerance
  • Progress tracking for distributed operations Distributed I/O processing capabilities

This module provides infrastructure for distributed processing of large datasets across multiple nodes or processes, enabling scalable I/O operations for terabyte-scale data processing.

§Features

  • Distributed file reading: Split large files across multiple workers
  • Parallel writing: Coordinate writes from multiple processes
  • Data partitioning: Automatic partitioning strategies for various formats
  • Load balancing: Dynamic work distribution based on node capabilities
  • Fault tolerance: Handle node failures and data recovery
  • Progress tracking: Monitor distributed operations

§Examples

use scirs2_io::distributed::{DistributedReader, PartitionStrategy};
use scirs2_core::ndarray::Array2;

// Create a distributed reader for a large CSV file
let reader = DistributedReader::new("large_dataset.csv")
    .partition_strategy(PartitionStrategy::RowBased { chunk_size: 1_000_000 })
    .num_workers(4);

// Process chunks in parallel
let results: Vec<i32> = reader.process_parallel(|chunk| {
    // Process each chunk (calculate some statistic from the bytes)
    // This is a simplified example - real implementation would parse CSV data
    let sum: u32 = chunk.iter().map(|&b| b as u32).sum();
    Ok((sum / chunk.len() as u32) as i32) // Return average byte value
})?;

Structs§

DistributedArray
Distributed array operations
DistributedReader
Distributed reader for parallel file processing
DistributedWriter
Distributed writer for parallel file writing
FileMetadata
File metadata
LocalFileSystem
Local file system implementation
WorkerInfo
Worker information

Enums§

Distribution
Distribution strategy for arrays
MergeStrategy
Strategy for merging distributed write outputs
PartitionStrategy
Partition strategy for distributed processing
WorkerStatus
Worker status

Traits§

DistributedFileSystem
Distributed file system abstraction