# SciRS2 IO
[](https://crates.io/crates/scirs2-io)
[](../LICENSE)
[](https://docs.rs/scirs2-io)
**Production-ready Input/Output module for the SciRS2 scientific computing library (v0.1.5).** Following the [SciRS2 POLICY](../SCIRS2_POLICY.md), this module provides comprehensive functionality for reading and writing various scientific and numerical data formats with high performance, reliability, and enhanced real-time streaming capabilities through scirs2-core abstractions.
## Features
### Core File Format Support
- **MATLAB Support**: Complete `.mat` file format support with all data types
- **WAV File Support**: Professional-grade WAV audio file processing
- **ARFF Support**: Full Weka ARFF (Attribute-Relation File Format) implementation
- **CSV Support**: Advanced CSV processing with flexible configuration and type inference
- **NetCDF Support**: Complete NetCDF3 and NetCDF4/HDF5 integration with hierarchical data management
- **HDF5 Support**: Comprehensive Hierarchical Data Format with groups, datasets, compression, and chunking
- **Matrix Market**: High-performance sparse and dense matrix format support
- **Harwell-Boeing**: Complete sparse matrix format implementation
### Advanced Data Processing
- **Image Support**: Professional image processing (PNG, JPEG, BMP, TIFF) with EXIF metadata
- **Data Serialization**: Multi-format serialization (Binary, JSON, MessagePack) with metadata preservation
- **Data Compression**: Production-grade compression with parallel processing support
- Multiple algorithms: GZIP, Zstandard, LZ4, BZIP2
- **Up to 2.5x performance improvement** with parallel processing
- Configurable compression levels and threading
- **Data Validation**: Enterprise-grade validation with comprehensive error reporting
- Multiple checksum algorithms (CRC32, SHA-256, BLAKE3)
- JSON Schema-compatible validation engine
- Format-specific validators
- **Sparse Matrix Operations**: Optimized sparse matrix handling (COO, CSR, CSC formats)
### High-Performance Features
- **Parallel Processing**: Multi-threaded operations with automatic optimization
- **Streaming Interfaces**: Memory-efficient processing for large datasets
- **Async I/O**: Non-blocking operations with tokio integration
- **Memory Mapping**: Efficient handling of large arrays without memory overhead
- **Network I/O**: HTTP/HTTPS client with progress tracking and retry logic
- **Cloud Integration**: Framework for AWS S3, Google Cloud Storage, Azure Blob Storage
### Production Quality
- **Zero Warnings**: Clean compilation with comprehensive error handling
- **114 Unit Tests**: Extensive test coverage with edge case validation
- **Cross-Platform**: Linux, macOS, Windows support
- **API Stability**: Stable APIs with semantic versioning
- **Performance Benchmarks**: Validated performance improvements
## Installation
Add to your `Cargo.toml`:
```toml
[dependencies]
scirs2-io = "0.1.5"
```
Enable specific features as needed:
```toml
[dependencies]
scirs2-io = { version = "0.1.5", features = ["hdf5", "async", "compression"] }
```
### Available Features
- `default`: CSV, compression, and validation (recommended for most use cases)
- `hdf5`: HDF5 file format support
- `async`: Asynchronous I/O with tokio
- `reqwest`: Network operations and HTTP client
- `all`: All features enabled
## Quick Start
### Basic File Operations
```rust
use scirs2_io::{matlab, csv, image, compression};
use scirs2_core::error::CoreResult;
use ndarray::Array2;
// Read MATLAB file
let data = matlab::loadmat("data.mat")?;
let array = data.get_array::<f64>("matrix")?;
// Process CSV with automatic type detection
let (headers, data) = csv::read_csv_numeric("dataset.csv", None)?;
println!("Dataset shape: {:?}", data.shape());
// Handle images with metadata
let (image_data, metadata) = image::read_image("photo.jpg")?;
println!("Image: {}x{} pixels", metadata.width, metadata.height);
// High-performance compression
let compressed = compression::compress_data(&large_dataset,
compression::CompressionAlgorithm::Zstd, Some(6))?;
```
### Advanced Parallel Processing
```rust
use scirs2_io::compression::{
compress_data_parallel, ParallelCompressionConfig, CompressionAlgorithm
};
// Configure high-performance parallel compression
let config = ParallelCompressionConfig {
num_threads: 8,
chunk_size: 1024 * 1024, // 1MB chunks
buffer_size: 64 * 1024, // 64KB buffer
enable_memory_mapping: true,
};
// Process large dataset (10MB example)
let large_data = vec![0u8; 10_000_000];
let (compressed, stats) = compress_data_parallel(
&large_data,
CompressionAlgorithm::Zstd,
Some(6),
config
)?;
println!("Compressed to {:.1}% in {:.2}ms",
100.0 * stats.bytes_output as f64 / stats.bytes_processed as f64,
stats.operation_time_ms);
println!("Throughput: {:.2} MB/s", stats.throughput_bps / 1_000_000.0);
```
### Schema-Based Data Validation
```rust
use scirs2_io::validation::{SchemaValidator, schema_helpers, SchemaConstraint};
use serde_json::json;
let validator = SchemaValidator::new();
// Define validation schema
let user_schema = schema_helpers::object([
("name", schema_helpers::string()
.with_constraint(SchemaConstraint::MinLength(1))
.required()),
("age", schema_helpers::integer()
.with_constraint(SchemaConstraint::MinValue(0.0))
.with_constraint(SchemaConstraint::MaxValue(150.0))
.required()),
("email", schema_helpers::email().required()),
].into_iter().collect());
// Validate data
let user_data = json!({
"name": "Alice Johnson",
"age": 30,
"email": "alice@example.com"
});
let result = validator.validate(&user_data, &user_schema);
if result.valid {
println!("Data validation passed!");
} else {
for error in &result.errors {
println!("Validation error in {}: {}", error.path, error.message);
}
}
```
### Streaming Large Files
```rust
use scirs2_io::streaming::{StreamingConfig, process_file_chunked};
// Process large files efficiently
let config = StreamingConfig::default()
.chunk_size(64 * 1024)
.enable_progress_reporting(true);
let (result, stats) = process_file_chunked("large_dataset.bin", config,
|chunk_data, chunk_id| {
// Process each chunk
println!("Processing chunk {}: {} bytes", chunk_id, chunk_data.len());
// Your processing logic here
Ok(())
})?;
println!("Processed {} chunks, {} total bytes",
stats.total_chunks, stats.total_bytes_processed);
```
## API Reference
### File Format Modules
#### MATLAB Files
```rust
use scirs2_io::matlab::{loadmat, savemat, MatFile, MatVar};
```
#### Scientific Data Formats
```rust
use scirs2_io::{
netcdf::{NetCDFFile, NetCDFOptions, NetCDFFormat},
hdf5::{HDF5File, CompressionOptions, DatasetOptions},
matrix_market::{read_matrix_market, write_matrix_market},
};
```
#### Image Processing
```rust
use scirs2_io::image::{
read_image, write_image, convert_image, get_grayscale,
ImageFormat, ColorMode, ImageMetadata
};
```
### Data Processing
#### Compression
```rust
use scirs2_io::compression::{
// Basic compression
compress_data, decompress_data,
// Parallel processing
compress_data_parallel, decompress_data_parallel,
// Configuration
CompressionAlgorithm, ParallelCompressionConfig,
// Array-specific
ndarray::{compress_array, decompress_array},
};
```
#### Validation
```rust
use scirs2_io::validation::{
// Integrity checking
calculate_checksum, verify_checksum,
// Schema validation
SchemaValidator, schema_helpers, SchemaConstraint,
// Format validation
formats::{validate_format, detect_file_format},
};
```
#### Serialization
```rust
use scirs2_io::serialize::{
serialize_array, deserialize_array,
serialize_sparse_matrix, deserialize_sparse_matrix,
SerializationFormat, SparseMatrixCOO,
};
```
## Performance Characteristics
- **Parallel Compression**: Up to 2.5x faster than single-threaded operations
- **Memory Efficiency**: Streaming interfaces for datasets larger than RAM
- **Network I/O**: Optimized for scientific data transfer with retry logic
- **Zero-Copy**: Memory mapping for large file operations
- **SIMD-Ready**: Architecture prepared for vectorized operations
## Format Support Details
### MATLAB (.mat)
- All MATLAB data types (double, single, integers, logical, char)
- Multidimensional arrays, structures, and cell arrays
- Both compressed and uncompressed formats
- Full metadata preservation
### NetCDF/HDF5
- NetCDF3 Classic and NetCDF4/HDF5 formats
- Unlimited dimensions and compression
- Group hierarchies and attributes
- Chunked storage for large datasets
### Image Formats
- PNG, JPEG, BMP, TIFF with full metadata
- Color space handling (RGB, RGBA, Grayscale)
- EXIF metadata extraction and manipulation
- Format conversion with quality control
### Sparse Matrices
- COO, CSR, CSC format support
- Matrix Market and Harwell-Boeing formats
- Efficient format conversion with caching
- Integration with numerical operations
## System Requirements
- **Rust**: Edition 2021, MSRV 1.70+
- **Platforms**: Linux, macOS, Windows (64-bit)
- **Optional**: HDF5 system library for HDF5 features
- **Memory**: Configurable memory usage for large datasets
## Production Deployment
This library is production-ready with:
- **Comprehensive testing**: 114 unit tests with edge case coverage
- **Memory safety**: Zero unsafe code in core paths
- **Error handling**: Detailed error messages with recovery suggestions
- **Performance monitoring**: Built-in statistics and benchmarking
- **Backwards compatibility**: Semantic versioning with stable APIs
## Contributing
We welcome contributions! Please see our [Contributing Guidelines](../CONTRIBUTING.md) for details.
For production releases:
- All features require comprehensive tests
- Performance changes need benchmarks
- API changes require documentation updates
- Security considerations for all I/O operations
## License
Licensed under either:
Choose the license that works best for your project.
---
**Ready for Production**: scirs2-io v0.1.5 provides enterprise-grade I/O capabilities for scientific computing applications.