1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
use Tensor;
use crateBackend;
/// Calculate the pairwise Euclidean distance matrix for a given 2D tensor
///
/// # Arguments
/// * `x` - A 2D tensor of shape (n_samples, n_features) where each row is a sample and each column is a feature
///
/// # Returns
/// A 1D tensor containing the pairwise distances (upper triangular part of the distance matrix) of shape (n_samples)
///
/// This function computes the pairwise Euclidean distance between samples by using broadcasting
/// to efficiently subtract the samples from each other, squaring the differences, and summing across the features.
/// Computes the sum of the top K smallest pairwise squared Euclidean distances for each sample in the input tensor.
///
/// This function calculates the Euclidean distances between all pairs of samples in the input tensor `x` using an efficient method
/// that avoids creating a full 3D tensor of pairwise distances. It then returns the sum of the K smallest distances for each sample.
///
/// # Parameters
/// - `x`: A 2D tensor of shape `(n_samples, n_features)` representing the dataset, where each row is a sample and each column is a feature.
/// - `k`: The number of nearest neighbors to consider when computing the sum of distances.
///
/// # Returns
/// - A 1D tensor of shape `(n_samples,)` containing the sum of the squared Euclidean distances to the top K nearest neighbors
/// for each sample. The distance computation is done efficiently using broadcasting to avoid creating large intermediate tensors.
///
/// # Example
/// ```rust
/// let x = Tensor::from([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]);
/// let k = 2;
/// let result = euclidean_knn(x, k);
/// println!("{:?}", result); // Output: sum of squared distances for each sample to its 2 nearest neighbors
/// ```
/// Compute the L1 (Manhattan) norm of each row in a 2-D tensor.
///
/// For each sample row `x[i]`, returns `∑_j |x[i, j]|`.
///
/// # Arguments
///
/// * `tensor` — A 2-D tensor of shape `[n_samples, n_features]`.
///
/// # Returns
///
/// A 1-D tensor of shape `[n_samples]` where element `i` is the sum of
/// absolute feature values for sample `i`.
///
/// # Note
///
/// This computes the L1 norm of each row, *not* pairwise Manhattan distances
/// between rows. It is used as a distance-from-origin proxy when the
/// `Manhattan` metric is selected.
/// Computes the cosine similarity between each row of a 2D tensor and the first row.
///
/// This function calculates the cosine similarity between each sample (row) in the input tensor
/// and the first sample (first row). The cosine similarity is defined as the dot product of two
/// vectors divided by the product of their magnitudes (L2 norms). The result is a 1D tensor where
/// each element represents the cosine similarity between the corresponding row and the first row.
///
/// # Arguments
/// * `tensor` - A 2D tensor of shape `(n_samples, n_features)` representing the data. The function
/// computes cosine similarity between each row (sample) and the first row.
///
/// # Returns
/// A 1D tensor of shape `(n_samples,)` containing the cosine similarities between the first row and
/// each of the other rows in the input tensor. The values are in the range [-1, 1], where 1 indicates
/// identical orientation, 0 indicates orthogonality, and -1 indicates opposite orientation.
///
/// # Example
/// ```
/// let tensor = Tensor::from([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]);
/// let similarities = cosine(tensor);
/// // `similarities` is a 1D tensor of cosine similarities between the first row and all other rows
/// ```
///
/// # Notes
/// The function uses the following steps:
/// 1. Computes the L2 norm (magnitude) of the first row.
/// 2. Computes the dot product of each row with the first row.
/// 3. Computes the L2 norm of each row.
/// 4. Divides the dot product by the product of the norms to compute cosine similarity.
///
/// # Performance
/// This function clones the tensor multiple times, which may impact performance for large tensors.
/// Optimizations could be made to minimize memory allocations and cloning.
/// Computes the Minkowski distance between each row of a tensor and the first row.
///
/// The Minkowski distance is a generalized distance metric defined as:
///
/// D(x, y) = (sum_i |x_i - y_i|^p)^(1/p)
///
/// Where `x` and `y` are vectors (rows), and `p` is the order of the distance. When `p = 1`,
/// this becomes the **Manhattan distance**, and when `p = 2`, it becomes the **Euclidean distance**.
///
/// This function calculates the Minkowski distance between each row of the input tensor and the
/// first row of the tensor. It returns a 1D tensor containing the computed distances for each row.
///
/// # Arguments
/// * `tensor` - A 2D tensor of shape `(n_samples, n_features)`, where each row represents a sample.
/// * `p` - A scalar value representing the order of the Minkowski distance. `p` must be a positive number.
///
/// # Returns
/// A 1D tensor of shape `(n_samples,)` where each element is the Minkowski distance between the
/// corresponding row of the input tensor and the first row.
///
/// # Example
/// ```
/// let tensor = Tensor::from([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]);
/// let distances = minkowski(tensor, 2.0);
/// // `distances` will contain the Euclidean distances between each row and the first row.
/// ```
///
/// # Notes
/// - The first row of the tensor is used as the reference row to compute distances.
/// - The function supports any positive value of `p`. For `p = 1`, it computes Manhattan distance,
/// and for `p = 2`, it computes Euclidean distance.
/// - The function works element-wise along rows and sums over features (columns) to compute the distance.
///
/// # Performance
/// The function clones the tensor to avoid modifying the original data. For large tensors, this may
/// incur some overhead due to memory allocation. You may want to explore optimization techniques like
/// in-place operations if memory usage is a concern.