1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
//! # SciRS2 IO - Scientific Data Input/Output
//!
//! **scirs2-io** provides comprehensive file I/O capabilities for scientific computing,
//! supporting MATLAB, NetCDF, HDF5, CSV, WAV, image formats, and more, with streaming,
//! compression, async support, and database connectivity.
//!
//! ## 🎯 Key Features
//!
//! - **SciPy Compatibility**: Similar to `scipy.io` for MATLAB, WAV, ARFF files
//! - **Multiple Formats**: MATLAB (.mat), NetCDF, HDF5, CSV, WAV, images (PNG, JPEG, TIFF)
//! - **Matrix Market**: Sparse matrix exchange format
//! - **Streaming I/O**: Memory-efficient reading/writing of large datasets
//! - **Compression**: GZIP, ZSTD, LZ4, BZIP2 for data compression
//! - **Async I/O**: Non-blocking operations with tokio
//! - **Database**: SQL/NoSQL connectivity (PostgreSQL, MongoDB, InfluxDB)
//!
//! ## 📦 Module Overview
//!
//! | SciRS2 Module | SciPy Equivalent | Description |
//! |---------------|------------------|-------------|
//! | `matlab` | `scipy.io.loadmat`, `savemat` | MATLAB .mat file I/O |
//! | `wavfile` | `scipy.io.wavfile` | WAV audio file I/O |
//! | `netcdf` | `scipy.io.netcdf` | NetCDF scientific data format |
//! | `matrix_market` | `scipy.io.mmread`, `mmwrite` | Matrix Market sparse format |
//! | `csv` | - | CSV with type conversion |
//! | `image` | - | PNG, JPEG, BMP, TIFF image I/O |
//!
//! ## 🚀 Quick Start
//!
//! ```toml
//! [dependencies]
//! scirs2-io = "0.4.2"
//! ```
//!
//! ```rust,no_run
//! use scirs2_io::csv::{read_csv, CsvReaderConfig};
//!
//! // Read CSV file
//! let config = CsvReaderConfig {
//! has_header: true,
//! delimiter: ',',
//! ..Default::default()
//! };
//! let (headers, data) = read_csv("data.csv", Some(config)).unwrap();
//! ```
//!
//! ## 🔒 Version: 0.4.2 (March 27, 2026)
//!
//! ## Modules
//!
//! - `arff`: Support for ARFF (Attribute-Relation File Format) files
//! - `compression`: Utilities for data compression and decompression
//! - `csv`: Support for CSV (Comma-Separated Values) files
//! - `image`: Support for image file formats (PNG, JPEG, BMP, TIFF)
//! - `matlab`: Support for MATLAB (.mat) files
//! - `matrix_market`: Support for Matrix Market sparse and dense matrix files
//! - `netcdf`: Support for NetCDF scientific data files
//! - `serialize`: Utilities for data serialization and deserialization
//! - `validation`: Utilities for data validation and integrity checking
//! - `wavfile`: Support for WAV audio files
//! - `error`: Error types for the IO module
//! - `fortran`: Support for Fortran unformatted files
// Allow specific Clippy warnings with justifications
// Manual div_ceil implementation for compatibility with Rust versions without div_ceil
// from_str methods are used consistently across modules
// Complex type is necessary for format validators
/// Advanced Mode Coordinator - Unified Intelligence for I/O Operations
///
/// Provides the highest level of intelligent I/O processing by coordinating multiple advanced systems:
/// - Neural adaptive optimization with reinforcement learning
/// - Quantum-inspired parallel processing with superposition algorithms
/// - GPU acceleration with multi-backend support
/// - Advanced memory management and resource allocation
/// - Real-time performance monitoring and self-optimization
/// - Meta-learning for cross-domain adaptation
/// - Emergent behavior detection and autonomous system improvement
/// Pure Rust BMP image file format (24-bit uncompressed)
///
/// Provides reading and writing of 24-bit uncompressed RGB BMP files
/// using a pure Rust implementation with no external image library dependencies.
/// Pure Rust columnar storage format
///
/// Provides a simplified Parquet-like columnar storage format with:
/// - Column-oriented storage for efficient analytical queries
/// - Run-length encoding (RLE) for repeated values
/// - Dictionary encoding for categorical string data
/// - Delta encoding for sorted numeric columns
/// - Support for f64, i64, String, and bool column types
/// Enhanced algorithms for Advanced Mode
///
/// Provides advanced algorithmic enhancements for the Advanced coordinator:
/// - Advanced pattern recognition with deep learning capabilities
/// - Multi-scale feature extraction and analysis
/// - Emergent pattern detection and meta-pattern recognition
/// - Sophisticated optimization recommendation systems
/// - Self-improving algorithmic components with adaptive learning
/// Async I/O support for streaming capabilities
///
/// Provides asynchronous I/O interfaces for non-blocking processing of large datasets:
/// - Async file reading and writing with tokio
/// - Asynchronous stream processing with backpressure
/// - Concurrent processing with configurable concurrency levels
/// - Network I/O support for remote data access
/// - Cancellation support for long-running operations
/// - Real-time progress monitoring for async operations
/// Data compression module
///
/// Provides utilities for compressing and decompressing scientific data:
/// - Lossless compression algorithms (GZIP, ZSTD, LZ4, BZIP2)
/// - Array compression with metadata preservation
/// - Chunked compression for large datasets
/// - Compression level configuration
/// CSV (Comma-Separated Values) file format module
///
/// Provides functionality for reading and writing CSV files with various options:
/// - Basic CSV reading and writing
/// - Type conversion and automatic type detection
/// - Missing value handling with customizable options
/// - Memory-efficient processing of large files using chunked reading
/// - Support for specialized data types (date, time, complex numbers)
/// - Column-based operations with flexible configuration
/// Database connectivity
///
/// Provides interfaces for database operations:
/// - Support for SQL databases (PostgreSQL, MySQL, SQLite)
/// - NoSQL database support (MongoDB, Redis, Cassandra)
/// - Time series databases (InfluxDB)
/// - Query builder and ORM-like features
/// - Bulk loading and export capabilities
/// - Integration with scientific data formats
/// Distributed I/O processing
///
/// Provides infrastructure for distributed processing of large datasets:
/// - Distributed file reading with partitioning strategies
/// - Parallel writing with merge capabilities
/// - Distributed array operations
/// - Load balancing and fault tolerance
/// - Progress tracking for distributed operations
/// Domain-specific file formats
///
/// Provides specialized support for scientific file formats:
/// - Bioinformatics: FASTA, FASTQ, SAM/BAM, VCF
/// - Geospatial: GeoTIFF, Shapefile, GeoJSON, KML
/// - Astronomical: FITS, VOTable
/// Fortran unformatted file format module
///
/// Provides functionality for reading and writing Fortran unformatted files:
/// - Sequential, direct, and stream access modes
/// - Support for different endianness and record marker sizes
/// - Automatic format detection
/// - Arrays stored in column-major order (Fortran convention)
/// - Support for all common Fortran data types
/// GPU-accelerated I/O operations
///
/// Provides GPU-accelerated implementations of I/O operations using the scirs2-core GPU abstraction:
/// - GPU-accelerated compression and decompression
/// - GPU-accelerated data type conversions
/// - GPU-accelerated matrix operations for file I/O
/// - GPU-accelerated checksum computation
/// - Support for multiple GPU backends (CUDA, Metal, OpenCL)
/// - Automatic fallback to CPU when GPU is not available
/// GPU-accelerated I/O operations
///
/// Provides comprehensive GPU acceleration for I/O operations including:
/// - Multi-backend GPU support (CUDA, Metal, OpenCL)
/// - GPU-accelerated compression and decompression
/// - Advanced GPU memory management with pooling
/// - Performance monitoring and optimization
/// - Intelligent backend selection and workload optimization
/// Harwell-Boeing sparse matrix format module
///
/// Provides functionality for reading and writing Harwell-Boeing sparse matrix files:
/// - Support for real and complex matrices
/// - Different matrix symmetry types (general, symmetric, hermitian, skew-symmetric)
/// - Pattern matrices (structure only, no values)
/// - Conversion to/from column-compressed sparse (CCS) format
/// - Integration with ndarray for efficient matrix operations
/// HDF5 file format module
///
/// Provides functionality for reading and writing HDF5 (Hierarchical Data Format) files:
/// - Reading and writing HDF5 groups and datasets
/// - Support for attributes on groups and datasets
/// - Multiple datatypes (integers, floats, strings, compound types)
/// - Chunking and compression options
/// - Integration with ndarray for efficient array operations
/// IDL (Interactive Data Language) save file format module
///
/// Provides functionality for reading and writing IDL save files (.sav):
/// - Support for all standard IDL data types
/// - Arrays, strings, structures, and complex numbers
/// - Automatic endianness detection and handling
/// - Compatible with IDL 8.0 format
/// - Commonly used in astronomy and remote sensing
/// Image file format module
///
/// Provides functionality for reading and writing common image formats:
/// - Reading and writing PNG, JPEG, BMP, and TIFF images
/// - Metadata extraction and manipulation
/// - Conversion between different image formats
/// - Basic image processing operations
/// Matrix Market file format module
///
/// Provides functionality for reading and writing Matrix Market files:
/// - Support for sparse matrix coordinate format (COO)
/// - Support for dense array format
/// - Real, complex, integer, and pattern data types
/// - Different matrix symmetry types (general, symmetric, hermitian, skew-symmetric)
/// - Integration with ndarray for efficient matrix operations
/// Advanced metadata management
///
/// Provides comprehensive metadata handling across different file formats:
/// - Unified metadata interface for all formats
/// - Metadata validation with schemas
/// - Processing history tracking
/// - Format conversion between JSON, YAML, TOML
/// - Format-specific extensions
/// - Standard metadata keys for scientific data
/// Mini-batch sampler with shuffle and stratified splitting.
///
/// Provides index-based batch sampling for machine learning pipelines:
/// - Configurable batch size with optional last-batch dropping
/// - Deterministic or random shuffling via seeded PRNG
/// - Stratified sampling that preserves class-label distributions across batches
/// - Train/validation/test split with optional stratification
/// Machine learning framework compatibility
///
/// Provides conversion utilities and interfaces for ML frameworks:
/// - Support for PyTorch, TensorFlow, ONNX, SafeTensors formats
/// - Model and tensor serialization/deserialization
/// - Data type conversions between frameworks
/// - Dataset utilities for ML pipelines
/// - Seamless integration with ndarray
/// Data pipeline APIs
///
/// Provides memory-mapped file operations for efficient handling of large arrays:
/// - Memory-mapped arrays for minimal memory usage
/// - Read-only and read-write access modes
/// - Support for multi-dimensional arrays
/// - Cross-platform compatibility (Unix and Windows)
/// - Type-safe operations with generic numeric types
///
/// # Examples
///
/// ```rust,no_run
/// use scirs2_io::mmap::{MmapArray, create_mmap_array};
/// use scirs2_core::ndarray::Array2;
///
/// // Create a large array file
/// let data = Array2::from_shape_fn((1000, 1000), |(i, j)| (i + j) as f64);
/// let file_path = "large_array.bin";
///
/// // Write array to file
/// create_mmap_array(file_path, &data)?;
///
/// // Memory-map the array for reading
/// let mmap_array: MmapArray<f64> = MmapArray::open(file_path)?;
/// let shape = mmap_array.shape()?;
/// let array_view = mmap_array.as_array_view(&shape)?;
///
/// // Access data without loading entire file into memory
/// let slice = mmap_array.as_slice()?;
/// let value = slice[500 * 1000 + 500]; // Access element at (500, 500)
/// println!("Value at (500, 500): {}", value);
/// # Ok::<(), scirs2_io::error::IoError>(())
/// ```
/// NetCDF file format module
///
/// Provides functionality for reading and writing NetCDF files:
/// - Reading and writing NetCDF3 files
/// - Support for dimensions, variables, and attributes
/// - Conversion between NetCDF and ndarray data structures
/// - Memory-efficient access to large datasets
/// Pure Rust NetCDF Classic format reader/writer
///
/// Provides a complete pure Rust implementation of the NetCDF Classic (version 1)
/// binary format without any C dependencies:
/// - Named dimensions (including unlimited/record dimensions)
/// - Typed variables (byte, char, short, int, float, double)
/// - Global and per-variable attributes
/// - Big-endian binary format with proper 4-byte alignment
/// Network I/O and cloud storage integration
///
/// Provides functionality for reading and writing files over network protocols
/// and integrating with cloud storage services:
/// - HTTP/HTTPS file download and upload with progress tracking
/// - Cloud storage integration (AWS S3, Google Cloud Storage, Azure Blob Storage)
/// - Streaming I/O for efficient handling of large files over network
/// - Authentication and secure credential management
/// - Retry logic and error recovery for network operations
/// - Local caching for offline access and performance optimization
///
/// # Examples
///
/// ```rust,no_run
/// use scirs2_io::network::NetworkClient;
///
/// // Create a network client for downloading files
/// let client = NetworkClient::new();
/// println!("Network client created for file operations");
/// ```
/// Neural-adaptive I/O optimization with advanced-level intelligence
///
/// Provides AI-driven adaptive optimization for I/O operations:
/// - Machine learning-based performance optimization
/// - Dynamic parameter adaptation based on system metrics
/// - Neural network-driven decision making for resource allocation
/// - Real-time performance feedback and learning
/// - Advanced-high performance processing with adaptive algorithms
/// - SIMD-accelerated neural inference for low-latency decisions
/// NumPy NPY/NPZ binary file format support
///
/// Provides reading and writing of NumPy's binary file formats:
/// - `.npy` files: single arrays with header metadata
/// - `.npz` files: ZIP archives of multiple named `.npy` arrays
/// - Support for f32, f64, i32, i64 dtypes
/// - Big-endian and little-endian byte order support
/// Out-of-core processing for terabyte-scale datasets
///
/// Provides infrastructure for processing datasets too large for memory:
/// - Memory-mapped arrays with virtual memory management
/// - Chunked processing with configurable chunk sizes
/// - Disk-based algorithms for sorting and aggregation
/// - Virtual arrays combining multiple data sources
/// - Sliding window iterators for streaming operations
/// Apache Parquet columnar file format module
///
/// Provides functionality for reading and writing Apache Parquet files:
/// - Efficient columnar storage for large datasets
/// - Multiple compression codecs (Snappy, Gzip, LZ4, ZSTD, Brotli)
/// - Schema inference and validation
/// - Column projection for selective reading
/// - Memory-efficient chunked reading for large files
/// - Integration with Apache Arrow for high-performance I/O
/// - Python interoperability (Pandas, Polars, PyArrow compatible)
///
/// # Examples
///
/// ```rust,no_run
/// use scirs2_io::parquet::{read_parquet, write_parquet, ParquetWriteOptions};
/// use scirs2_core::ndarray::Array1;
///
/// // Write data to Parquet
/// let data = Array1::from_vec(vec![1.0, 2.0, 3.0, 4.0]);
/// write_parquet("data.parquet", &data, Default::default())?;
///
/// // Read data from Parquet
/// let loaded = read_parquet("data.parquet")?;
/// println!("Loaded {} rows", loaded.num_rows());
/// # Ok::<(), scirs2_io::error::IoError>(())
/// ```
/// Data pipeline APIs
///
/// Provides a flexible framework for building data processing pipelines:
/// - Composable pipeline stages for reading, transforming, and writing data
/// - Multiple execution strategies (sequential, parallel, streaming, async)
/// - Built-in transformations (normalization, encoding, aggregation)
/// - Error handling and recovery mechanisms
/// - Progress tracking and monitoring
/// - Caching and checkpointing for long-running pipelines
/// Quantum-inspired I/O processing algorithms with advanced capabilities
///
/// Provides quantum-inspired algorithms for advanced-high performance I/O:
/// - Quantum superposition for parallel processing paths
/// - Quantum entanglement for correlated data operations
/// - Quantum annealing for parameter optimization
/// - Quantum interference patterns for data compression
/// - Quantum tunneling for barrier-free processing
/// - Quantum measurement for adaptive decision making
/// Real-time data streaming protocols
///
/// Provides infrastructure for real-time data streaming and processing:
/// - WebSocket and Server-Sent Events support
/// - gRPC and MQTT streaming protocols
/// - Backpressure handling and flow control
/// - Stream transformations and filtering
/// - Multi-stream synchronization
/// - Time series buffering and aggregation
/// Data serialization utilities
///
/// Provides functionality for serializing and deserializing scientific data:
/// - Binary, JSON, and MessagePack serialization formats
/// - Array serialization with metadata
/// - Structured data serialization
/// - Sparse matrix serialization
/// SIMD-accelerated I/O operations
///
/// Provides SIMD-optimized implementations of common I/O operations:
/// - Data type conversions with SIMD
/// - Audio normalization and processing
/// - CSV parsing acceleration
/// - Compression utilities with SIMD
/// - Checksum calculations
/// Comprehensive sparse matrix format support
///
/// Provides unified support for common sparse matrix formats:
/// - COO (Coordinate), CSR (Compressed Sparse Row), and CSC (Compressed Sparse Column) formats
/// - Efficient format conversion algorithms
/// - Matrix operations (addition, multiplication, transpose)
/// - I/O support with Matrix Market integration
/// - Performance-optimized algorithms for large sparse matrices
/// - Memory-efficient sparse data handling
///
/// # Examples
///
/// ```rust,no_run
/// use scirs2_io::sparse::SparseMatrix;
/// use scirs2_core::ndarray::Array2;
///
/// // Create a sparse matrix from a dense array
/// let dense = Array2::from_shape_vec((3, 3), vec![
/// 1.0_f64, 0.0_f64, 2.0_f64,
/// 0.0_f64, 3.0_f64, 0.0_f64,
/// 4.0_f64, 0.0_f64, 5.0_f64
/// ]).unwrap();
///
/// let mut sparse = SparseMatrix::from_dense_2d(&dense, 0.0_f64)?;
/// println!("Sparse matrix: {} non-zeros", sparse.nnz());
///
/// // Convert to different formats
/// let _csr = sparse.to_csr()?;
/// let _csc = sparse.to_csc()?;
///
/// // Save to file
/// sparse.save_matrix_market("matrix.mtx")?;
/// # Ok::<(), scirs2_io::error::IoError>(())
/// ```
/// Streaming and iterator interfaces for large data handling
///
/// Provides memory-efficient streaming interfaces for processing large datasets:
/// - Chunked reading for processing files in configurable chunks
/// - Iterator-based APIs for seamless integration with Rust's iterator ecosystem
/// - Streaming CSV processing with header support
/// - Memory-efficient processing without loading entire files
/// - Performance monitoring and statistics tracking
///
/// # Examples
///
/// ```rust,no_run
/// use scirs2_io::streaming::{StreamingConfig, process_file_chunked};
///
/// // Process a large file in chunks
/// let config = StreamingConfig::default().chunk_size(64 * 1024);
///
/// let (result, stats) = process_file_chunked("large_file.dat", config, |chunk_data, chunk_id| {
/// println!("Processing chunk {}: {} bytes", chunk_id, chunk_data.len());
/// Ok(())
/// })?;
/// # Ok::<(), scirs2_io::error::IoError>(())
/// ```
/// Thread pool for parallel I/O operations
///
/// Provides a high-performance thread pool optimized for I/O operations:
/// - Separate thread pools for I/O-bound and CPU-bound tasks
/// - Work stealing for load balancing
/// - Performance monitoring and statistics
/// - Configurable thread counts and queue sizes
/// - Global thread pool for convenience
/// Data validation and integrity checking module
///
/// Provides functionality for validating data integrity through checksums,
/// format validation, and other verification methods:
/// - File integrity validation with multiple checksum algorithms (CRC32, SHA256, BLAKE3)
/// - Format-specific validation for scientific data formats
/// - Directory manifests for data validation
/// - Integrity metadata for tracking data provenance
/// Visualization tool integration
///
/// Provides interfaces for integrating with visualization libraries:
/// - Export to multiple visualization formats (Plotly, Matplotlib, Gnuplot, Vega-Lite)
/// - Fluent API for building plots
/// - Support for various plot types (line, scatter, histogram, heatmap)
/// - Quick plotting functions for common use cases
/// - Configurable styling and theming
/// Workflow automation tools
///
/// Provides framework for building automated data processing workflows:
/// - Task definition and dependency management
/// - Workflow scheduling and execution
/// - Resource management and allocation
/// - Retry policies and error handling
/// - Progress monitoring and notifications
/// - Common workflow templates (ETL, batch processing)
/// Zero-copy I/O optimizations
///
/// Provides zero-copy implementations for various I/O operations:
/// - Memory-mapped file access
/// - Zero-copy array views
/// - CSV parsing without allocation
/// - Binary data reading without copying
/// - Minimized memory allocations for large datasets
// Adaptive compression (entropy-based; OxiARC-backed codecs)
// Cloud storage abstraction: ObjectStore trait, LocalObjectStore, MemoryObjectStore,
// S3/GCS/Azure stubs, MultipartUpload, URL parsing.
// GCS resumable upload state machine and Azure Blob SAS token support included.
// AWS S3-specific multipart upload state machine (simulation + feature-gated stubs).
// Exactly-once delivery semantics for streaming pipeline sinks.
// Uses idempotency keys + write-ahead log to ensure each message is processed once.
pub use ;
// Delta Lake integration
// Lance columnar format
// MQTT broker
// Schema registry
// Table provider (DataFusion integration)
// Tensor serialization (SafeTensors, ONNX, TFRecord)
// TileDB array storage
// Apache Iceberg table format support
// DataFusion-compatible table provider interface
// Vectorized expression evaluation for filter and project operations
// Join algorithms for cross-format dataset merging
// Re-export commonly used functionality
pub use ;
pub use ;
// Re-export new format modules' key types
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;