Module out_of_core

Module out_of_core 

Source
Expand description

Out-of-core processing for terabyte-scale datasets

Provides infrastructure for processing datasets too large for memory:

  • Memory-mapped arrays with virtual memory management
  • Chunked processing with configurable chunk sizes
  • Disk-based algorithms for sorting and aggregation
  • Virtual arrays combining multiple data sources
  • Sliding window iterators for streaming operations Out-of-core processing for terabyte-scale datasets

This module provides infrastructure for processing datasets that are too large to fit in memory, enabling work with terabyte-scale scientific data through efficient memory management and disk-based algorithms.

§Features

  • Memory-mapped arrays: Efficient access to large arrays on disk
  • Chunked processing: Process data in manageable chunks
  • Virtual memory management: Smart caching and paging
  • Disk-based algorithms: Sorting, grouping, and aggregation
  • HDF5 integration: Leverage HDF5 for structured storage
  • Compression support: On-the-fly compression/decompression

§Examples

use scirs2_io::out_of_core::{OutOfCoreArray, ChunkProcessor};
use scirs2_core::ndarray::Array2;

// Create an out-of-core array
let array = OutOfCoreArray::<f64>::create("large_array.ooc", &[1_000_000, 100_000])?;

// Process in chunks
array.process_chunks(1000, |chunk| {
    // Process each chunk
    let mean = chunk.mean().unwrap();
    Ok(mean)
})?;

// Virtual array view
let view = array.view_window(&[0, 0], &[1000, 1000])?;

Structs§

OutOfCoreArray
Out-of-core array for processing large datasets
OutOfCoreConfig
Out-of-core array configuration
OutOfCoreSorter
Out-of-core sorting for large datasets
SlidingWindow
Sliding window iterator for out-of-core processing
VirtualArray
Virtual array that combines multiple arrays

Traits§

ArraySource
Source for virtual array components
ChunkProcessor
Chunk processor for streaming operations