1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
//! GPU acceleration for computationally intensive operations
//!
//! This module provides GPU-accelerated implementations for:
//! - FFT-based Kernel Density Estimation (KDE)
//! - Feature matrix operations
//! - Statistical calculations
//!
//! ## Performance
//!
//! GPU acceleration provides significant benefits for **batched multi-channel operations**:
//!
//! | Configuration | Batched GPU | Sequential CPU | Speedup |
//! |--------------|-------------|----------------|---------|
//! | 5 channels, 50K events | 250 µs | 4.9 ms | **19.7x** |
//! | 5 channels, 100K events | 421 µs | 10.1 ms | **24.0x** |
//! | 5 channels, 500K events | 1.8 ms | 54.0 ms | **30.3x** |
//! | 10 channels, 500K events | 4.1 ms | 109 ms | **26.6x** |
//! | 10 channels, 1M events | 7.8 ms | 253 ms | **32.3x** |
//!
//! Batched operations provide significant speedup even for smaller datasets (50K+ events).
//!
//! ## Implementation Details
//!
//! - **Backend**: WGPU (WebGPU) via burn framework
//! - **Custom Kernels**: cubeCL kernels available for complex multiplication (optional)
//! - **Batching**: GPU context reuse and kernel caching amortize overhead
//! - **Fallback**: Automatic CPU fallback when GPU unavailable
//!
//! ## Usage
//!
//! GPU acceleration is automatic when:
//! - `--features gpu` is enabled
//! - GPU is available
//!
//! For batched operations, use `kde_fft_batched_gpu()` with `GpuContext` for best performance.
pub use ;
pub use GpuContext;
pub use kde_fft_gpu;
pub use ;
pub use build_feature_matrix_gpu;
pub use ;
// Threshold constants removed - GPU is now used whenever available
// Batched operations provide speedup even for smaller datasets (50K+ events)