Expand description
Out-of-core processing for terabyte-scale datasets
Provides infrastructure for processing datasets too large for memory:
- Memory-mapped arrays with virtual memory management
- Chunked processing with configurable chunk sizes
- Disk-based algorithms for sorting and aggregation
- Virtual arrays combining multiple data sources
- Sliding window iterators for streaming operations Out-of-core processing for terabyte-scale datasets
This module provides infrastructure for processing datasets that are too large to fit in memory, enabling work with terabyte-scale scientific data through efficient memory management and disk-based algorithms.
§Features
- Memory-mapped arrays: Efficient access to large arrays on disk
- Chunked processing: Process data in manageable chunks
- Virtual memory management: Smart caching and paging
- Disk-based algorithms: Sorting, grouping, and aggregation
- HDF5 integration: Leverage HDF5 for structured storage
- Compression support: On-the-fly compression/decompression
§Examples
use scirs2_io::out_of_core::{OutOfCoreArray, ChunkProcessor};
use scirs2_core::ndarray::Array2;
// Create an out-of-core array
let array = OutOfCoreArray::<f64>::create("large_array.ooc", &[1_000_000, 100_000])?;
// Process in chunks
array.process_chunks(1000, |chunk| {
// Process each chunk
let mean = chunk.mean().unwrap();
Ok(mean)
})?;
// Virtual array view
let view = array.view_window(&[0, 0], &[1000, 1000])?;Structs§
- OutOf
Core Array - Out-of-core array for processing large datasets
- OutOf
Core Config - Out-of-core array configuration
- OutOf
Core Sorter - Out-of-core sorting for large datasets
- Sliding
Window - Sliding window iterator for out-of-core processing
- Virtual
Array - Virtual array that combines multiple arrays
Traits§
- Array
Source - Source for virtual array components
- Chunk
Processor - Chunk processor for streaming operations