scirs2-io 0.2.0

Input/Output utilities module for SciRS2 (scirs2-io)
Documentation
# SciRS2 IO

[![crates.io](https://img.shields.io/crates/v/scirs2-io.svg)](https://crates.io/crates/scirs2-io)
[![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](../LICENSE)
[![Documentation](https://img.shields.io/docsrs/scirs2-io)](https://docs.rs/scirs2-io)

**Production-ready Input/Output module for the SciRS2 scientific computing library (v0.1.5).** Following the [SciRS2 POLICY](../SCIRS2_POLICY.md), this module provides comprehensive functionality for reading and writing various scientific and numerical data formats with high performance, reliability, and enhanced real-time streaming capabilities through scirs2-core abstractions.

## Features

### Core File Format Support
- **MATLAB Support**: Complete `.mat` file format support with all data types
- **WAV File Support**: Professional-grade WAV audio file processing  
- **ARFF Support**: Full Weka ARFF (Attribute-Relation File Format) implementation
- **CSV Support**: Advanced CSV processing with flexible configuration and type inference
- **NetCDF Support**: Complete NetCDF3 and NetCDF4/HDF5 integration with hierarchical data management
- **HDF5 Support**: Comprehensive Hierarchical Data Format with groups, datasets, compression, and chunking
- **Matrix Market**: High-performance sparse and dense matrix format support
- **Harwell-Boeing**: Complete sparse matrix format implementation

### Advanced Data Processing
- **Image Support**: Professional image processing (PNG, JPEG, BMP, TIFF) with EXIF metadata
- **Data Serialization**: Multi-format serialization (Binary, JSON, MessagePack) with metadata preservation
- **Data Compression**: Production-grade compression with parallel processing support
  - Multiple algorithms: GZIP, Zstandard, LZ4, BZIP2
  - **Up to 2.5x performance improvement** with parallel processing
  - Configurable compression levels and threading
- **Data Validation**: Enterprise-grade validation with comprehensive error reporting
  - Multiple checksum algorithms (CRC32, SHA-256, BLAKE3)
  - JSON Schema-compatible validation engine
  - Format-specific validators
- **Sparse Matrix Operations**: Optimized sparse matrix handling (COO, CSR, CSC formats)

### High-Performance Features
- **Parallel Processing**: Multi-threaded operations with automatic optimization
- **Streaming Interfaces**: Memory-efficient processing for large datasets
- **Async I/O**: Non-blocking operations with tokio integration
- **Memory Mapping**: Efficient handling of large arrays without memory overhead
- **Network I/O**: HTTP/HTTPS client with progress tracking and retry logic
- **Cloud Integration**: Framework for AWS S3, Google Cloud Storage, Azure Blob Storage

### Production Quality
- **Zero Warnings**: Clean compilation with comprehensive error handling
- **114 Unit Tests**: Extensive test coverage with edge case validation
- **Cross-Platform**: Linux, macOS, Windows support
- **API Stability**: Stable APIs with semantic versioning
- **Performance Benchmarks**: Validated performance improvements

## Installation

Add to your `Cargo.toml`:

```toml
[dependencies]
scirs2-io = "0.1.5"
```

Enable specific features as needed:

```toml
[dependencies]
scirs2-io = { version = "0.1.5", features = ["hdf5", "async", "compression"] }
```

### Available Features
- `default`: CSV, compression, and validation (recommended for most use cases)
- `hdf5`: HDF5 file format support
- `async`: Asynchronous I/O with tokio
- `reqwest`: Network operations and HTTP client
- `all`: All features enabled

## Quick Start

### Basic File Operations

```rust
use scirs2_io::{matlab, csv, image, compression};
use scirs2_core::error::CoreResult;
use ndarray::Array2;

// Read MATLAB file
let data = matlab::loadmat("data.mat")?;
let array = data.get_array::<f64>("matrix")?;

// Process CSV with automatic type detection
let (headers, data) = csv::read_csv_numeric("dataset.csv", None)?;
println!("Dataset shape: {:?}", data.shape());

// Handle images with metadata
let (image_data, metadata) = image::read_image("photo.jpg")?;
println!("Image: {}x{} pixels", metadata.width, metadata.height);

// High-performance compression
let compressed = compression::compress_data(&large_dataset, 
    compression::CompressionAlgorithm::Zstd, Some(6))?;
```

### Advanced Parallel Processing

```rust
use scirs2_io::compression::{
    compress_data_parallel, ParallelCompressionConfig, CompressionAlgorithm
};

// Configure high-performance parallel compression
let config = ParallelCompressionConfig {
    num_threads: 8,
    chunk_size: 1024 * 1024,  // 1MB chunks
    buffer_size: 64 * 1024,   // 64KB buffer
    enable_memory_mapping: true,
};

// Process large dataset (10MB example)
let large_data = vec![0u8; 10_000_000];
let (compressed, stats) = compress_data_parallel(
    &large_data, 
    CompressionAlgorithm::Zstd, 
    Some(6), 
    config
)?;

println!("Compressed to {:.1}% in {:.2}ms", 
    100.0 * stats.bytes_output as f64 / stats.bytes_processed as f64,
    stats.operation_time_ms);
println!("Throughput: {:.2} MB/s", stats.throughput_bps / 1_000_000.0);
```

### Schema-Based Data Validation

```rust
use scirs2_io::validation::{SchemaValidator, schema_helpers, SchemaConstraint};
use serde_json::json;

let validator = SchemaValidator::new();

// Define validation schema
let user_schema = schema_helpers::object([
    ("name", schema_helpers::string()
        .with_constraint(SchemaConstraint::MinLength(1))
        .required()),
    ("age", schema_helpers::integer()
        .with_constraint(SchemaConstraint::MinValue(0.0))
        .with_constraint(SchemaConstraint::MaxValue(150.0))
        .required()),
    ("email", schema_helpers::email().required()),
].into_iter().collect());

// Validate data
let user_data = json!({
    "name": "Alice Johnson",
    "age": 30,
    "email": "alice@example.com"
});

let result = validator.validate(&user_data, &user_schema);
if result.valid {
    println!("Data validation passed!");
} else {
    for error in &result.errors {
        println!("Validation error in {}: {}", error.path, error.message);
    }
}
```

### Streaming Large Files

```rust
use scirs2_io::streaming::{StreamingConfig, process_file_chunked};

// Process large files efficiently
let config = StreamingConfig::default()
    .chunk_size(64 * 1024)
    .enable_progress_reporting(true);

let (result, stats) = process_file_chunked("large_dataset.bin", config, 
    |chunk_data, chunk_id| {
        // Process each chunk
        println!("Processing chunk {}: {} bytes", chunk_id, chunk_data.len());
        // Your processing logic here
        Ok(())
    })?;

println!("Processed {} chunks, {} total bytes", 
    stats.total_chunks, stats.total_bytes_processed);
```

## API Reference

### File Format Modules

#### MATLAB Files
```rust
use scirs2_io::matlab::{loadmat, savemat, MatFile, MatVar};
```

#### Scientific Data Formats
```rust
use scirs2_io::{
    netcdf::{NetCDFFile, NetCDFOptions, NetCDFFormat},
    hdf5::{HDF5File, CompressionOptions, DatasetOptions},
    matrix_market::{read_matrix_market, write_matrix_market},
};
```

#### Image Processing
```rust
use scirs2_io::image::{
    read_image, write_image, convert_image, get_grayscale,
    ImageFormat, ColorMode, ImageMetadata
};
```

### Data Processing

#### Compression
```rust
use scirs2_io::compression::{
    // Basic compression
    compress_data, decompress_data,
    // Parallel processing
    compress_data_parallel, decompress_data_parallel,
    // Configuration
    CompressionAlgorithm, ParallelCompressionConfig,
    // Array-specific
    ndarray::{compress_array, decompress_array},
};
```

#### Validation
```rust
use scirs2_io::validation::{
    // Integrity checking
    calculate_checksum, verify_checksum,
    // Schema validation
    SchemaValidator, schema_helpers, SchemaConstraint,
    // Format validation
    formats::{validate_format, detect_file_format},
};
```

#### Serialization
```rust
use scirs2_io::serialize::{
    serialize_array, deserialize_array,
    serialize_sparse_matrix, deserialize_sparse_matrix,
    SerializationFormat, SparseMatrixCOO,
};
```

## Performance Characteristics

- **Parallel Compression**: Up to 2.5x faster than single-threaded operations
- **Memory Efficiency**: Streaming interfaces for datasets larger than RAM
- **Network I/O**: Optimized for scientific data transfer with retry logic
- **Zero-Copy**: Memory mapping for large file operations
- **SIMD-Ready**: Architecture prepared for vectorized operations

## Format Support Details

### MATLAB (.mat)
- All MATLAB data types (double, single, integers, logical, char)
- Multidimensional arrays, structures, and cell arrays
- Both compressed and uncompressed formats
- Full metadata preservation

### NetCDF/HDF5
- NetCDF3 Classic and NetCDF4/HDF5 formats
- Unlimited dimensions and compression
- Group hierarchies and attributes
- Chunked storage for large datasets

### Image Formats
- PNG, JPEG, BMP, TIFF with full metadata
- Color space handling (RGB, RGBA, Grayscale)
- EXIF metadata extraction and manipulation
- Format conversion with quality control

### Sparse Matrices
- COO, CSR, CSC format support
- Matrix Market and Harwell-Boeing formats
- Efficient format conversion with caching
- Integration with numerical operations

## System Requirements

- **Rust**: Edition 2021, MSRV 1.70+
- **Platforms**: Linux, macOS, Windows (64-bit)
- **Optional**: HDF5 system library for HDF5 features
- **Memory**: Configurable memory usage for large datasets

## Production Deployment

This library is production-ready with:
- **Comprehensive testing**: 114 unit tests with edge case coverage
- **Memory safety**: Zero unsafe code in core paths
- **Error handling**: Detailed error messages with recovery suggestions
- **Performance monitoring**: Built-in statistics and benchmarking
- **Backwards compatibility**: Semantic versioning with stable APIs

## Contributing

We welcome contributions! Please see our [Contributing Guidelines](../CONTRIBUTING.md) for details.

For production releases:
- All features require comprehensive tests
- Performance changes need benchmarks  
- API changes require documentation updates
- Security considerations for all I/O operations

## License

Licensed under either:

Choose the license that works best for your project.

---

**Ready for Production**: scirs2-io v0.1.5 provides enterprise-grade I/O capabilities for scientific computing applications.