1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
//! This provides different implementations of the reduce algorithm which
//! can run on multiple GPU backends using CubeCL.
//!
//! A reduction is a tensor operation mapping a rank `R` tensor to a rank `R - 1`
//! by agglomerating all elements along a given axis with some binary operator.
//! This is often also called folding.
//!
//! This crate provides a main entrypoint as the [`reduce`] function which allows to automatically
//! perform a reduction for a given instruction implementing the [`ReduceInstruction`] trait and a given [`ReduceStrategy`].
//! It also provides implementation of the [`ReduceInstruction`] trait for common operations in the [`instructions`] module.
//! Finally, it provides many reusable primitives to perform different general reduction algorithms in the [`primitives`] module.
pub use crateReduceStrategy;
use crate::;
pub use ;
use *;
pub use *;
pub use ;
pub use shared_sum;
/// Reduce the given `axis` of the `input` tensor using the instruction `Inst` and write the result into `output`.
///
/// An optional [`ReduceStrategy`] can be provided to force the reduction to use a specific algorithm. If omitted, a best effort
/// is done to try and pick the best strategy supported for the provided `client`.
///
/// Return an error if `strategy` is `Some(strategy)` and the specified strategy is not supported by the `client`.
/// Also returns an error if the `axis` is larger than the `input` rank or if the shape of `output` is invalid.
/// The shape of `output` must be the same as input except with a value of 1 for the given `axis`.
///
///
/// # Example
///
/// This examples show how to sum the rows of a small `2 x 2` matrix into a `1 x 2` vector.
/// For more details, see the CubeCL documentation.
///
/// ```ignore
/// use cubecl_reduce::instructions::Sum;
///
/// let client = /* ... */;
/// let size_f32 = std::mem::size_of::<f32>();
/// let axis = 0; // 0 for rows, 1 for columns in the case of a matrix.
///
/// // Create input and output handles.
/// let input_handle = client.create(f32::as_bytes(&[0, 1, 2, 3]));
/// let input = unsafe {
/// TensorHandleRef::from_raw_parts(
/// &input_handle,
/// &[2, 1],
/// &[2, 2],
/// size_f32,
/// )
/// };
///
/// let output_handle = client.empty(2 * size_f32);
/// let output = unsafe {
/// TensorHandleRef::from_raw_parts(
/// &output_handle,
/// &output_stride,
/// &output_shape,
/// size_f32,
/// )
/// };
///
/// // Here `R` is a `cubecl::Runtime`.
/// let result = reduce::<R, f32, f32, Sum>(&client, input, output, axis, None);
///
/// if result.is_ok() {
/// let binding = output_handle.binding();
/// let bytes = client.read_one(binding);
/// let output_values = f32::from_bytes(&bytes);
/// println!("Output = {:?}", output_values); // Should print [1, 5].
/// }
/// ```
// Check that the given axis is less than the rank of the input.
// Check that the output shape match the input shape with the given axis set to 1.