1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
//! Tensor operations
//!
//! This module defines operation traits and implementations for
//! arithmetic, matrix operations, reductions, and activations.
//!
//! # Design
//!
//! Operations are defined as traits that are implemented by `RuntimeClient`.
//! This gives operations access to device and allocator for creating output tensors.
//!
//! ```text
//! RuntimeClient<R>
//! └── implements TensorOps<R>
//! ├── add, sub, mul, div (binary arithmetic)
//! ├── neg, sqrt, exp, ... (unary operations)
//! ├── matmul (matrix multiplication)
//! ├── sum, mean, max, min (reductions)
//! └── relu, sigmoid, softmax (activations)
//! ```
//!
//! # Implementing Operations for a New Backend
//!
//! To add operations for a new backend (e.g., CUDA, WebGPU):
//!
//! 1. **Implement `TensorOps<YourRuntime>` for your `Client` type:**
//! ```ignore
//! impl TensorOps<CudaRuntime> for CudaClient {
//! fn add(&self, a: &Tensor<CudaRuntime>, b: &Tensor<CudaRuntime>) -> Result<Tensor<CudaRuntime>> {
//! // 1. Validate shapes are broadcastable
//! let out_shape = broadcast_shape(a.shape(), b.shape())
//! .ok_or(Error::BroadcastError { ... })?;
//!
//! // 2. Allocate output tensor
//! let out = Tensor::empty(&out_shape, a.dtype(), self.device());
//!
//! // 3. Dispatch kernel
//! cuda_add_kernel(a.ptr(), b.ptr(), out.ptr(), ...);
//!
//! Ok(out)
//! }
//! // ... other operations
//! }
//! ```
//!
//! 2. **Use helper types for operation parameters:**
//! - [`BinaryOp`], [`UnaryOp`] - Operation kind enums for dispatch
//! - [`MatmulParams`] - Matrix multiplication configuration
//! - [`ReduceOp`] - Reduction operation kinds
//! - [`ActivationKind`] - Activation function kinds
//!
//! 3. **Use validation helpers:**
//! - `broadcast_shape` - Compute broadcast shape for binary ops
//! - `validate_matmul_shapes` - Validate matmul dimensions
//! - `reduce_output_shape` - Compute reduction output shape
//!
//! # Operation Categories
//!
//! ## Element-wise Operations
//! Binary (add, sub, mul, div) and unary (neg, abs, sqrt, exp, log, sin, cos, tanh).
//!
//! **Note:** Broadcasting is implemented for binary arithmetic and comparison ops
//! via strided kernels on CPU.
//!
//! ## Matrix Operations
//! Matrix multiplication with batching support. Inner dimensions must match.
//!
//! ## Reductions
//! Sum, mean, max, min over specified dimensions with optional keepdim.
//!
//! ## Activations
//! ReLU, sigmoid, softmax for neural network layers.
pub
pub
pub
pub
pub
pub
pub
pub
// Re-export user-facing types
pub use ActivationKind;
pub use ;
pub use MatmulParams;
pub use ReduceOp;
pub use SemiringOp;
pub use SpecialFunctions;
// Internal re-exports (accessible within the crate only)
pub use broadcast_shape;
pub use ;
pub use ;
pub use Fp8MatmulOps;
pub use ;
pub use ;