numrs2 0.1.0

A Rust implementation inspired by NumPy for numerical computing (NumRS2)
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
# NumRS2 - High-Performance Numerical Computing for Rust

[![Build Status](https://github.com/cool-japan/numrs/workflows/CI/badge.svg)](https://github.com/cool-japan/numrs/actions)
[![Crates.io](https://img.shields.io/crates/v/numrs2.svg)](https://crates.io/crates/numrs2)
[![Documentation](https://docs.rs/numrs2/badge.svg)](https://docs.rs/numrs2)
[![License: Apache-2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

NumRS2 is a high-performance numerical computing library for Rust, designed as a Rust-native alternative to NumPy. It provides N-dimensional arrays, linear algebra operations, and comprehensive mathematical functions with a focus on performance, safety, and ease of use.

> **🚀 Version 0.1.0-rc.3** - Release Candidate: Production-ready SIMD optimizations, 11 scipy-equivalent modules, and complete NumPy compatibility. Features 86 AVX2-vectorized functions + 42 ARM NEON operations, comprehensive interpolation, and 647 tests passing with zero warnings.

## ✨ Architecture Highlights

### 🏗️ Enhanced Design
- **Trait-based architecture** for extensibility and generic programming
- **Hierarchical error system** with rich context and recovery suggestions  
- **Memory management** with pluggable allocators (Arena, Pool, NUMA-aware)
- **Comprehensive documentation** with migration guides and best practices

### 🔧 Core Features
- **N-dimensional arrays** with efficient memory layout and broadcasting
- **Advanced linear algebra** with BLAS/LAPACK integration and matrix decompositions
- **SIMD optimization** with automatic vectorization and CPU feature detection
- **Thread safety** with parallel processing support via Rayon
- **Python interoperability** for easy migration from NumPy

## Main Features

- **N-dimensional Array**: Core `Array` type with efficient memory layout and NumPy-compatible broadcasting
- **Advanced Linear Algebra**:
  - Matrix operations, decompositions, solvers through BLAS/LAPACK integration
  - Sparse matrices (COO, CSR, CSC, DIA formats) with format conversions
  - Iterative solvers (CG, GMRES, BiCGSTAB) for large systems
  - Randomized algorithms (randomized SVD, random projections, range finders)
- **Numerical Optimization**: BFGS, L-BFGS, Trust Region, Nelder-Mead, Levenberg-Marquardt, constrained optimization
- **Root-Finding**: Bisection, Brent, Newton-Raphson, Secant, Halley, fixed-point iteration
- **Numerical Differentiation**: Gradient, Jacobian, Hessian with Richardson extrapolation
- **Automatic Differentiation**: Forward and reverse mode AD with higher-order derivatives
- **Data Interoperability**:
  - Apache Arrow integration for zero-copy data exchange
  - Feather format support for fast columnar storage
  - IPC streaming for inter-process communication
  - Python bindings via PyO3 for NumPy compatibility
- **Expression Templates**: Lazy evaluation and operation fusion for performance
- **Advanced Indexing**: Fancy indexing, boolean masking, and conditional selection
- **Polynomial Functions**: Interpolation, evaluation, and arithmetic operations
- **Fast Fourier Transform**: Optimized FFT implementation with 1D/2D transforms, real FFT specialization, frequency shifting, and various windowing functions
- **SIMD Acceleration**: Enhanced vectorized operations via SciRS2-Core with AVX2/AVX512/NEON support
- **Parallel Computing**: Advanced multi-threaded execution with adaptive chunking and work-stealing
- **GPU Acceleration**: Optional GPU-accelerated array operations using WGPU
- **Mathematical Functions**: Comprehensive set of element-wise mathematical operations
- **Statistical Analysis**: Descriptive statistics, probability distributions, and more
- **Random Number Generation**: Modern interface for various distributions with fast generation and NumPy-compatible API
- **SciRS2 Integration**: Integration with SciRS2 for advanced statistical distributions and scientific computing functionality
- **Fully Type-Safe**: Leverage Rust's type system for compile-time guarantees

## Optional Features

NumRS2 includes several optional features that can be enabled in your `Cargo.toml`:

- **matrix_decomp** (enabled by default): Matrix decomposition functions (SVD, QR, LU, etc.)
- **lapack**: Enable LAPACK-dependent linear algebra operations (eigenvalues, matrix decompositions)
- **validation**: Additional runtime validation checks for array operations
- **arrow**: Apache Arrow integration for zero-copy data exchange with Python/Polars/DataFusion
- **python**: Python bindings via PyO3 for NumPy interoperability
- **gpu**: GPU acceleration for array operations using WGPU

To enable a feature:

```toml
[dependencies]
numrs2 = { version = "0.1.0-rc.3", features = ["arrow"] }
```

Or, when building:

```bash
cargo build --features scirs
```

### 🚀 Performance Optimizations

NumRS2 leverages SciRS2-Core (v0.1.0-rc.3) for cutting-edge performance optimizations:

- **Unified SIMD Operations**: All SIMD code goes through SciRS2-Core's SimdUnifiedOps trait
- **Adaptive Algorithm Selection**: AutoOptimizer automatically chooses between scalar, SIMD, or GPU implementations
- **Platform Detection**: Automatic detection of AVX2, AVX512, NEON, and GPU capabilities
- **Parallel Operations**: Optimized parallel processing with intelligent work distribution
- **Memory-Efficient Chunking**: Process large datasets without memory bottlenecks

See the [optimization example](examples/scirs2_optimization.rs) for usage details.

### SciRS2 Integration

The SciRS2 integration provides additional advanced statistical distributions:

- **Noncentral Chi-square**: Extends the standard chi-square with a noncentrality parameter
- **Noncentral F**: Extends the standard F distribution with a noncentrality parameter
- **Von Mises**: Circular normal distribution for directional statistics
- **Maxwell-Boltzmann**: Used for modeling particle velocities in physics
- **Truncated Normal**: Normal distribution with bounded support
- **Multivariate Normal with Rotation**: Allows rotation of the coordinate system

For examples, see [scirs_integration_example.rs](examples/scirs_integration_example.rs)

### GPU Acceleration

The GPU acceleration feature provides:

- GPU-accelerated array operations for significant performance improvements
- Seamless CPU/GPU interoperability with the same API
- Support for various operations: arithmetic, matrix multiplication, element-wise functions, etc.
- WGPU backend for cross-platform GPU support (Vulkan, Metal, DX12, WebGPU)

For examples, see [gpu_example.rs](examples/gpu_example.rs)

### 🎯 Release Candidate 3 Highlights (v0.1.0-rc.3)

**Numerical Optimization (scipy.optimize equivalent)**
- BFGS & L-BFGS: Quasi-Newton methods for large-scale optimization
- Trust Region: Robust optimization with dogleg path
- Nelder-Mead: Derivative-free simplex method
- Levenberg-Marquardt: Nonlinear least squares
- Constrained optimization: Projected gradient, penalty methods

**Root-Finding Algorithms (scipy.optimize.root_scalar)**
- Bracketing methods: Bisection, Brent, Ridder, Illinois
- Open methods: Newton-Raphson, Secant, Halley
- Fixed-point iteration for implicit equations

**Numerical Differentiation**
- Gradient, Jacobian, and Hessian computation
- Forward, backward, central differences
- Richardson extrapolation for high accuracy

**SIMD Optimization Infrastructure**
- 86 AVX2-optimized functions with automatic threshold-based dispatch
- 4-way loop unrolling and FMA (fused multiply-add) instructions
- ARM NEON support with 42 vectorized f64 operations
- Support for both f32 and f64 numeric types

**Production-Ready Features**
- Complete multi-array NPZ support for NumPy compatibility
- Zero clippy warnings and zero critical errors
- 1,637+ comprehensive tests (1,020 unit + 617 doc tests)
- Enhanced scheduler with critical deadlock fix (1,143x speedup)
- 122,799 lines of production Rust code

**Enhanced Modules**
- Linear algebra: Extended iterative solvers (CG, GMRES, BiCGSTAB, FGMRES, MINRES)
- Mathematical functions: 1,187 lines of enhanced operations
- Statistics: 1,397 lines of enhanced distributions and testing
- Polynomial operations: Complete NumPy polynomial compatibility
- Special functions: Spherical harmonics, Jacobi elliptic, Lambert W, and more

## Example

```rust
use numrs2::prelude::*;

fn main() -> Result<()> {
    // Create arrays
    let a = Array::from_vec(vec![1.0, 2.0, 3.0, 4.0]).reshape(&[2, 2]);
    let b = Array::from_vec(vec![5.0, 6.0, 7.0, 8.0]).reshape(&[2, 2]);
    
    // Basic operations with broadcasting
    let c = a.add(&b);
    let d = a.multiply_broadcast(&b)?;
    
    // Matrix multiplication
    let e = a.matmul(&b)?;
    println!("a @ b = {}", e);
    
    // Linear algebra operations
    let (u, s, vt) = a.svd_compute()?;
    println!("SVD components: U = {}, S = {}, Vt = {}", u, s, vt);
    
    // Eigenvalues and eigenvectors
    let symmetric = Array::from_vec(vec![2.0, 1.0, 1.0, 2.0]).reshape(&[2, 2]);
    let (eigenvalues, eigenvectors) = symmetric.eigh("lower")?;
    println!("Eigenvalues: {}", eigenvalues);
    
    // Polynomial interpolation
    let x = Array::linspace(0.0, 1.0, 5)?;
    let y = Array::from_vec(vec![0.0, 0.1, 0.4, 0.9, 1.6]);
    let poly = PolynomialInterpolation::lagrange(&x, &y)?;
    println!("Interpolated value at 0.5: {}", poly.evaluate(0.5));
    
    // FFT operations
    let signal = Array::from_vec(vec![1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]);
    // Window the signal before transforming
    let windowed_signal = signal.apply_window("hann")?;
    // Compute FFT
    let spectrum = windowed_signal.fft()?;
    // Shift frequencies to center the spectrum
    let centered = spectrum.fftshift_complex()?;
    println!("FFT magnitude: {}", spectrum.power_spectrum()?);
    
    // Statistical operations
    let data = Array::from_vec(vec![1.0, 2.0, 3.0, 4.0, 5.0]);
    println!("mean = {}", data.mean()?);
    println!("std = {}", data.std()?);
    
    // Sparse array operations
    let mut sparse = SparseArray::new(&[10, 10]);
    sparse.set(&[0, 0], 1.0)?;
    sparse.set(&[5, 5], 2.0)?;
    println!("Density: {}", sparse.density());
    
    // SIMD-accelerated operations
    let result = simd_ops::apply_simd(&data, |x| x * x + 2.0 * x + 1.0)?;
    println!("SIMD result: {}", result);

    // Random number generation
    let rng = random::default_rng();
    let uniform = rng.random::<f64>(&[3])?;
    let normal = rng.normal(0.0, 1.0, &[3])?;
    println!("Random uniform [0,1): {}", uniform);
    println!("Random normal: {}", normal);

    Ok(())
}
```

## Performance

NumRS is designed with performance as a primary goal:

- **Rust's Zero-Cost Abstractions**: Compile-time optimization without runtime overhead
- **BLAS/LAPACK Integration**: Industry-standard libraries for linear algebra operations
- **SIMD Vectorization**: Parallel processing at the CPU instruction level with automatic CPU feature detection
- **Memory Layout Optimization**: Cache-friendly data structures and memory alignment
- **Data Placement Strategies**: Optimized memory placement for better cache utilization
- **Adaptive Parallelization**: Smart thresholds to determine when parallel execution is beneficial
- **Scheduling Optimization**: Intelligent selection of work scheduling strategies based on workload
- **Fine-grained Parallelism**: Advanced workload partitioning for better load balancing
- **Modern Random Generation**: Advanced thread-safe RNG with PCG64 algorithm for high-quality randomness

## Expression Templates

NumRS2 provides a powerful expression templates system for lazy evaluation and performance optimization:

### SharedArray - Reference-Counted Arrays

```rust
use numrs2::prelude::*;

// Create shared arrays with natural operator syntax
let a: SharedArray<f64> = SharedArray::from_vec(vec![1.0, 2.0, 3.0, 4.0]);
let b: SharedArray<f64> = SharedArray::from_vec(vec![10.0, 20.0, 30.0, 40.0]);

// Cheap cloning (O(1) - just increments reference count)
let a_clone = a.clone();

// Natural operator overloading
let sum = a.clone() + b.clone();         // [11.0, 22.0, 33.0, 44.0]
let product = a.clone() * b.clone();     // [10.0, 40.0, 90.0, 160.0]
let scaled = a.clone() * 2.0;            // [2.0, 4.0, 6.0, 8.0]
let result = (a.clone() + b.clone()) * 2.0 - 5.0;  // Chained operations
```

### SharedExpr - Lifetime-Free Lazy Evaluation

```rust
use numrs2::expr::{SharedExpr, SharedExprBuilder};

// Build expressions lazily - no computation until eval()
let c: SharedArray<f64> = SharedArray::from_vec(vec![1.0, 2.0, 3.0, 4.0]);
let expr = SharedExprBuilder::from_shared_array(c);
let squared = expr.map(|x| x * x);   // Expression built, not evaluated
let result = squared.eval();         // [1.0, 4.0, 9.0, 16.0] - evaluated here
```

### Common Subexpression Elimination (CSE)

```rust
use numrs2::expr::{CachedExpr, ExprCache};

// Automatic caching of repeated computations
let cache: ExprCache<f64> = ExprCache::new();
let cached_expr = CachedExpr::new(sum_expr.into_expr(), cache.clone());

let result1 = cached_expr.eval();  // Computes and caches
let result2 = cached_expr.eval();  // Uses cached result
```

### Memory Access Pattern Optimization

```rust
use numrs2::memory_optimize::access_patterns::*;

// Detect memory layout for optimization
let layout = detect_layout(&[100, 100], &[100, 1]);  // CContiguous

// Get optimization hints for array shapes
let hints = OptimizationHints::default_for::<f64>(10000);
println!("Block size: {}", hints.block_size);
println!("Use parallel: {}", hints.use_parallel);

// Cache-aware iteration for large arrays
let block_iter = BlockedIterator::new(10000, 64);
for block in block_iter {
    // Process block.start..block.end with cache efficiency
}

// Cache-aware operations
cache_aware_transform(&src, &mut dst, |x| x * 2.0);
cache_aware_binary_op(&a, &b, &mut result, |x, y| x + y);
```

See the [expression templates example](examples/expression_templates_example.rs) for a comprehensive demonstration.

## Installation

Add this to your `Cargo.toml`:

```toml
[dependencies]
numrs2 = "0.1.0-rc.3"
```

For BLAS/LAPACK support, ensure you have the necessary system libraries:

**Note:** NumRS2 uses OxiBLAS, a pure Rust BLAS/LAPACK implementation with no C dependencies. You do NOT need to install system BLAS/LAPACK libraries.

To use LAPACK functionality (pure Rust via OxiBLAS):
```bash
cargo build --features lapack
cargo test --features lapack
```

OxiBLAS provides:
- Pure Rust implementation with SIMD optimizations (AVX2/NEON)
- No external C dependencies required
- 80-172% of OpenBLAS performance (competitive or faster on Apple M3)
- Complete BLAS Level 1/2/3 and LAPACK operations

## Implementation Details

NumRS is built on top of several battle-tested libraries:

- **ndarray**: Provides the foundation for n-dimensional arrays
- **ndarray-linalg**: Provides BLAS/LAPACK bindings for linear algebra
- **num-complex**: Complex number support for advanced operations
- **BLAS/LAPACK**: Powers high-performance linear algebra routines
- **Rayon**: Enables parallel computation capabilities
- **num-traits**: Provides generic numeric traits for numerical operations

## Features

NumRS2 provides a comprehensive suite of numerical computing capabilities:

### Core Functionality
- **N-dimensional arrays** with efficient memory layout and broadcasting
- **Linear algebra operations** with BLAS/LAPACK integration
- **Matrix decompositions** (SVD, QR, Cholesky, LU, Schur, COD)
- **Eigenvalue and eigenvector computation**
- **Mathematical functions** with numerical stability optimizations

### Performance Optimizations
- **SIMD acceleration** with automatic CPU feature detection
- **Parallel processing** with adaptive scheduling and load balancing  
- **Memory optimization** with cache-friendly data structures
- **Vectorized operations** for improved computational efficiency

### Advanced Features
- **Fast Fourier Transform** with 1D/2D transforms and windowing functions
- **Polynomial operations** and interpolation methods
- **Sparse matrix support** for memory-efficient computations
- **Random number generation** with multiple distribution support
- **Statistical analysis** functions and descriptive statistics

### Integration & Interoperability
- **GPU acceleration** support via WGPU (optional)
- **SciRS2 integration** for advanced statistical distributions (optional)
- **Memory-mapped arrays** for large dataset handling
- **Serialization support** for data persistence

## 📖 Documentation

### 📚 Comprehensive Guides
- **[Architecture Guide]docs/ARCHITECTURE.md** - System design and core concepts
- **[Migration Guide]docs/MIGRATION_GUIDE.md** - Upgrading from previous versions
- **[Trait System Guide]docs/TRAIT_GUIDE.md** - Generic programming with NumRS2
- **[Error Handling Guide]docs/ERROR_HANDLING.md** - Robust error management
- **[Memory Management Guide]docs/MEMORY_MANAGEMENT.md** - Optimizing memory usage

### 🔗 Additional Resources
- [Official API Documentation]https://docs.rs/numrs2 - Complete API reference
- [Getting Started Guide]GETTING_STARTED.md - Essential information for beginners
- [Installation Guide]INSTALL.md - Detailed installation instructions
- [User Guide]GUIDE.md - Comprehensive guide to all NumRS features
- [NumPy Migration Guide]NUMPY_MIGRATION.md - Guide for NumPy users transitioning to NumRS2
- [Implementation Status]IMPLEMENTATION_STATUS.md - Current status and next steps
- [Contributing Guide]CONTRIBUTING.md - How to contribute to NumRS2

Module-specific documentation:
  - [Random Module Guide]examples/README_RANDOM.md - Random number generation
  - [Statistics Module Guide]examples/README_STATISTICS.md - Statistical functions
  - [Linear Algebra Guide]examples/README_LINALG.md - Linear algebra operations
  - [Polynomial Guide]examples/README_POLYNOMIAL.md - Polynomial operations
  - [FFT Guide]examples/README_FFT.md - Fast Fourier Transform

Testing Documentation:
  - [Testing Guide]tests/README.md - Guide for NumRS testing approach
  - Property-based testing for mathematical operations
    - Property tests for linear algebra operations
    - Property tests for special functions
    - Statistical validation of random distributions
  - Reference testing
    - Reference tests for random distributions
    - Reference tests for linear algebra operations
    - Reference tests for special functions
  - Benchmarking
    - Linear algebra benchmarks
    - Special functions benchmarks

## Examples

Check out the `examples/` directory for more usage examples:

- `basic_usage.rs`: Core array operations and manipulations
- `linalg_example.rs`: Linear algebra operations and solvers
- `simd_example.rs`: SIMD-accelerated computations
- `memory_optimize_example.rs`: Memory layout optimization for cache efficiency
- `parallel_optimize_example.rs`: Parallelization optimization techniques
- `random_distributions_example.rs`: Comprehensive examples of random number generation
- See the [examples README]examples/README.md for more details

## Development

NumRS is in active development. See [TODO.md](TODO.md) for upcoming features and development roadmap.

## Testing

NumRS requires the `approx` crate for testing. Tests can be run after installation with:

```bash
cargo test
```

For running property-based and statistical tests for the random module:

```bash
cargo test --test test_random_statistical
cargo test --test test_random_properties
cargo test --test test_random_advanced
```

## Contributing

NumRS2 is a community-driven project, and we welcome contributions from everyone. There are many ways to contribute:

- **Code**: Implement new features or fix bugs
- **Documentation**: Improve guides, docstrings, or examples
- **Testing**: Write tests or improve existing ones
- **Reviewing**: Review pull requests from other contributors
- **Performance**: Identify bottlenecks or implement optimizations
- **Examples**: Create example code showing library usage

If you're interested in contributing, please read our [Contributing Guide](CONTRIBUTING.md) for detailed instructions on how to get started.

For significant changes, please open an issue to discuss your ideas first.

## License

This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.