SciRS2 IO
Production-ready Input/Output module for the SciRS2 scientific computing library (v0.1.0-rc.2). Following the SciRS2 POLICY, this module provides comprehensive functionality for reading and writing various scientific and numerical data formats with high performance, reliability, and enhanced real-time streaming capabilities through scirs2-core abstractions.
Features
Core File Format Support
- MATLAB Support: Complete
.matfile format support with all data types - WAV File Support: Professional-grade WAV audio file processing
- ARFF Support: Full Weka ARFF (Attribute-Relation File Format) implementation
- CSV Support: Advanced CSV processing with flexible configuration and type inference
- NetCDF Support: Complete NetCDF3 and NetCDF4/HDF5 integration with hierarchical data management
- HDF5 Support: Comprehensive Hierarchical Data Format with groups, datasets, compression, and chunking
- Matrix Market: High-performance sparse and dense matrix format support
- Harwell-Boeing: Complete sparse matrix format implementation
Advanced Data Processing
- Image Support: Professional image processing (PNG, JPEG, BMP, TIFF) with EXIF metadata
- Data Serialization: Multi-format serialization (Binary, JSON, MessagePack) with metadata preservation
- Data Compression: Production-grade compression with parallel processing support
- Multiple algorithms: GZIP, Zstandard, LZ4, BZIP2
- Up to 2.5x performance improvement with parallel processing
- Configurable compression levels and threading
- Data Validation: Enterprise-grade validation with comprehensive error reporting
- Multiple checksum algorithms (CRC32, SHA-256, BLAKE3)
- JSON Schema-compatible validation engine
- Format-specific validators
- Sparse Matrix Operations: Optimized sparse matrix handling (COO, CSR, CSC formats)
High-Performance Features
- Parallel Processing: Multi-threaded operations with automatic optimization
- Streaming Interfaces: Memory-efficient processing for large datasets
- Async I/O: Non-blocking operations with tokio integration
- Memory Mapping: Efficient handling of large arrays without memory overhead
- Network I/O: HTTP/HTTPS client with progress tracking and retry logic
- Cloud Integration: Framework for AWS S3, Google Cloud Storage, Azure Blob Storage
Production Quality
- Zero Warnings: Clean compilation with comprehensive error handling
- 114 Unit Tests: Extensive test coverage with edge case validation
- Cross-Platform: Linux, macOS, Windows support
- API Stability: Stable APIs with semantic versioning
- Performance Benchmarks: Validated performance improvements
Installation
Add to your Cargo.toml:
[]
= "0.1.0-rc.2"
Enable specific features as needed:
[]
= { = "0.1.0-rc.2", = ["hdf5", "async", "compression"] }
Available Features
default: CSV, compression, and validation (recommended for most use cases)hdf5: HDF5 file format supportasync: Asynchronous I/O with tokioreqwest: Network operations and HTTP clientall: All features enabled
Quick Start
Basic File Operations
use ;
use CoreResult;
use Array2;
// Read MATLAB file
let data = loadmat?;
let array = data.?;
// Process CSV with automatic type detection
let = read_csv_numeric?;
println!;
// Handle images with metadata
let = read_image?;
println!;
// High-performance compression
let compressed = compress_data?;
Advanced Parallel Processing
use ;
// Configure high-performance parallel compression
let config = ParallelCompressionConfig ;
// Process large dataset (10MB example)
let large_data = vec!;
let = compress_data_parallel?;
println!;
println!;
Schema-Based Data Validation
use ;
use json;
let validator = new;
// Define validation schema
let user_schema = object;
// Validate data
let user_data = json!;
let result = validator.validate;
if result.valid else
Streaming Large Files
use ;
// Process large files efficiently
let config = default
.chunk_size
.enable_progress_reporting;
let = process_file_chunked?;
println!;
API Reference
File Format Modules
MATLAB Files
use ;
Scientific Data Formats
use ;
Image Processing
use ;
Data Processing
Compression
use ;
Validation
use ;
Serialization
use ;
Performance Characteristics
- Parallel Compression: Up to 2.5x faster than single-threaded operations
- Memory Efficiency: Streaming interfaces for datasets larger than RAM
- Network I/O: Optimized for scientific data transfer with retry logic
- Zero-Copy: Memory mapping for large file operations
- SIMD-Ready: Architecture prepared for vectorized operations
Format Support Details
MATLAB (.mat)
- All MATLAB data types (double, single, integers, logical, char)
- Multidimensional arrays, structures, and cell arrays
- Both compressed and uncompressed formats
- Full metadata preservation
NetCDF/HDF5
- NetCDF3 Classic and NetCDF4/HDF5 formats
- Unlimited dimensions and compression
- Group hierarchies and attributes
- Chunked storage for large datasets
Image Formats
- PNG, JPEG, BMP, TIFF with full metadata
- Color space handling (RGB, RGBA, Grayscale)
- EXIF metadata extraction and manipulation
- Format conversion with quality control
Sparse Matrices
- COO, CSR, CSC format support
- Matrix Market and Harwell-Boeing formats
- Efficient format conversion with caching
- Integration with numerical operations
System Requirements
- Rust: Edition 2021, MSRV 1.70+
- Platforms: Linux, macOS, Windows (64-bit)
- Optional: HDF5 system library for HDF5 features
- Memory: Configurable memory usage for large datasets
Production Deployment
This library is production-ready with:
- Comprehensive testing: 114 unit tests with edge case coverage
- Memory safety: Zero unsafe code in core paths
- Error handling: Detailed error messages with recovery suggestions
- Performance monitoring: Built-in statistics and benchmarking
- Backwards compatibility: Semantic versioning with stable APIs
Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
For production releases:
- All features require comprehensive tests
- Performance changes need benchmarks
- API changes require documentation updates
- Security considerations for all I/O operations
License
Licensed under either:
Choose the license that works best for your project.
Ready for Production: scirs2-io v0.1.0-rc.2 provides enterprise-grade I/O capabilities for scientific computing applications.