scirs2-core 0.4.2

# Memory-Efficient Operations in SciRS2

This document describes the memory-efficient operations module in SciRS2, which provides tools for working with large datasets that might not fit entirely in memory.

## Overview

Scientific computing often involves operations on large arrays or matrices that exceed available RAM. The `memory_efficient` module provides several approaches to handle such cases:

1. **Chunked Processing**: Operating on data in manageable chunks to reduce peak memory usage
2. **Lazy Evaluation**: Deferring computation until results are needed
3. **Operation Fusion**: Combining multiple operations to avoid intermediate results
4. **Memory-Efficient Views**: Accessing data without copying
5. **Out-of-Core Processing**: Using disk storage for data that doesn't fit in RAM

## Features

### Chunked Arrays and Processing

The `ChunkedArray` type and related functions provide tools for processing large arrays in smaller, memory-friendly chunks.

```rust
// Process a large array in chunks
let result = chunk_wise_op(
    &large_array,
    |chunk| chunk.map(|x| x * x),
    ChunkingStrategy::Auto,
)?;
```

Chunking strategies include:
- `ChunkingStrategy::Auto`: Automatically determine optimal chunk size
- `ChunkingStrategy::Fixed(size)`: Use a fixed chunk size (in elements)
- `ChunkingStrategy::FixedBytes(bytes)`: Use a fixed chunk size (in bytes)
- `ChunkingStrategy::NumChunks(n)`: Divide the array into a specific number of chunks

### Lazy Arrays and Evaluation

Lazy arrays defer computation until results are needed, allowing for optimizations:

```rust
// Create a lazy array and define operations
let lazy_array = LazyArray::new(data.clone());
let lazy_result = lazy_array.map(|x| x * x);

// Only evaluate when needed
let result = evaluate(&lazy_result)?;
```

### Operation Fusion

Operation fusion combines multiple operations to avoid intermediate results:

```rust
// Create a fusion of operations
let mut fusion = OpFusion::new();
fusion.add_op(square_op)?
      .add_op(root_op)?
      .optimize()?;  // Optimize the operation chain

// Apply the fused operations
let result = fusion.apply(data)?;
```

### Memory-Efficient Views

The module provides efficient views that avoid copying data:

```rust
// Create views of an array
let transposed = transpose_view(&data)?;
let diagonal = diagonal_view(&data)?;

// Safely reinterpret memory without copying (when types are compatible)
let f32_view = unsafe { view_as::<f64, f32, _, _>(&data)? };
```

### Out-of-Core Processing

Out-of-core arrays store data on disk and load only what's needed:

```rust
// Create a disk-backed array
let disk_array = create_disk_array(
    &data,
    path,
    ChunkingStrategy::Fixed(1000),
    false,  // read-only
)?;

// Load the entire array or specific chunks
let loaded_data = disk_array.load()?;
let chunks = load_chunks(&disk_array, &[0, 2, 5])?;
```

## Performance Considerations

- **Chunk Size**: Larger chunks reduce overhead but increase peak memory usage
- **Parallelization**: Chunk-wise processing can be parallelized for better performance
- **Disk I/O**: Out-of-core processing is I/O-bound; use SSDs when possible
- **Operation Complexity**: Fused operations are more efficient for complex chains

## Integration with Other SciRS2 Features

The memory-efficient module integrates with:

- **Array Module**: Works with MaskedArrays and RecordArrays
- **Parallel Processing**: Chunked operations can use parallel processing when available
- **SIMD Acceleration**: Operations can leverage SIMD instructions for better performance
- **Memory Metrics**: Tracking memory usage in chunked operations

## Examples

See the following examples for practical applications:

- `memory_efficient_example.rs`: Basic usage of memory-efficient operations
- `large_dataset_processing.rs`: Processing a dataset larger than available RAM

## Benchmarks

When using memory-efficient operations, performance tradeoffs include:

1. **Processing Time**: Slightly slower than in-memory operations
2. **Memory Usage**: Significantly reduced peak memory usage
3. **Disk I/O**: Added overhead for out-of-core processing

Benchmark results show that chunked processing typically uses 1/N of the memory (where N is the number of chunks) with a 10-20% processing time overhead.

## Best Practices

1. Start with `ChunkingStrategy::Auto` and adjust as needed
2. For maximum performance, align chunk sizes with CPU cache sizes
3. Use lazy evaluation for complex operation chains
4. Prefer views over copying when possible
5. For out-of-core processing, use memory-mapped files when available

## Future Improvements

Planned enhancements include:

1. Memory-mapped file support for better out-of-core performance
2. More advanced operation fusion strategies
3. GPU-accelerated chunk processing
4. Streaming operations for continuous data processing
5. Distributed processing across multiple machines