1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
//! # Operation Dispatch Layer
//!
//! This module defines and dispatches tensor operations across different compute backends,
//! including CPU, GPU (WGPU), and optionally CUDA.
//!
//! ## Submodules
//!
//! - [`cpu`] — Multi-threaded + SIMD CPU operations (default fallback backend)
//! - [`wgpu`] *(opt-in)* — GPU compute shader pipelines using `wgpu`
//! - [`cuda`] *(planned)* — CUDA GPU backend for NVIDIA (not yet supported)
//! - [`dispatch`] — Dynamic backend switching and unified operation interfaces
//!
//! ## Backend Selection
//!
//! All core operations are designed to be backend-agnostic from the user perspective.
//! Dispatching logic is handled internally based on compile-time features or runtime flags.
//!
//! ## Extending the Backend
//!
//! To add a new operation:
//!
//! 1. Implement it in one or more backends (e.g. `cpu::my_op`, `wgpu::my_op`)
//! 2. Add it to the `dispatch` module for unified access
//! 3. Add shape/consistency checks in a backend-agnostic location
//!
//! ## Notes
//!
//! - GPU acceleration is only used when the feature flag is enabled
//! - CUDA support is not implemented yet; the module dispatches to WebGPU
//! - Operations must return both forward values and backward closures
//!
//! ## Goals
//!
//! - Keep backends cleanly separated
//! - Zero-cost dispatch when statically chosen
//! - Maximal performance with minimal code duplication
//!
//! ## Feature Flags
//!
//! - `wgpu` — Enables `wgpu` (WebGPU) backend
//! - `cuda` — Enables placeholder CUDA module (dispatches to WGPU)
// dispatch layer...
// ... across these backends: