1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
//! GPU acceleration for NumRS2
//!
//! This module provides GPU-accelerated versions of NumRS2 array operations using WGPU.
//! The implementation focuses on maintaining the same API as CPU-based operations
//! while providing significant performance improvements for large data sets.
//!
//! ## Feature Flag
//!
//! GPU acceleration is enabled via the "gpu" feature flag in Cargo.toml:
//!
//! ```toml
//! [dependencies]
//! numrs2 = { version = "0.1.1", features = ["gpu"] }
//! ```
//!
//! ## Example
//!
//! ```rust,ignore
//! use numrs2::array::Array;
//! use numrs2::gpu;
//!
//! #[cfg(feature = "gpu")]
//! fn main() -> numrs2::error::Result<()> {
//! // Create two arrays on the CPU (using f32 for better GPU compatibility)
//! let a = Array::from_vec(vec![1.0f32, 2.0, 3.0, 4.0, 5.0]).reshape(&[5]);
//! let b = Array::from_vec(vec![5.0f32, 4.0, 3.0, 2.0, 1.0]).reshape(&[5]);
//!
//! // Create GPU arrays from CPU arrays
//! let gpu_a = gpu::GpuArray::from_array(&a)?;
//! let gpu_b = gpu::GpuArray::from_array(&b)?;
//!
//! // Perform GPU-accelerated addition
//! let gpu_result = gpu::add(&gpu_a, &gpu_b)?;
//!
//! // Convert back to CPU array
//! let result = gpu_result.to_array()?;
//!
//! // Should be [6.0, 6.0, 6.0, 6.0, 6.0]
//! println!("Result: {:?}", result);
//!
//! Ok(())
//! }
//!
//! #[cfg(not(feature = "gpu"))]
//! fn main() {
//! println!("GPU support is not enabled. Recompile with --features gpu");
//! }
//! ```
//!
//! ## Supported Operations
//!
//! - Basic arithmetic: add, subtract, multiply, divide
//! - Element-wise functions: exp, log, sin, cos, etc.
//! - Matrix operations: matmul, transpose
//! - Reduction operations: sum, mean, min, max
//! - Batching operations: automatic batching of small operations for improved throughput
//!
//! ## Advanced Features
//!
//! - **Automatic Batching**: Queue small operations and execute them together to reduce overhead
//! - **Dynamic Optimization**: Adaptive batch sizes based on GPU occupancy and performance
//! - **Memory Management**: Buffer pooling, aliasing, and efficient data transfer strategies
//! - **Shader Composition**: Build complex operations from simpler kernels
//!
//! ## Limitations
//!
//! - GPU arrays must be of the same data type (f32 or f64)
//! - Operations between CPU and GPU arrays are not directly supported
//! - Not all NumRS2 operations are currently accelerated
//! - Performance benefits are most noticeable for large arrays
// Re-export public types
pub use GpuArray;
pub use ;
pub use *;
pub use get_gpu_info;
// Conditionally include GPU modules when the feature is enabled
// Placeholder stubs for non-GPU builds
;
;