wsp/lib.rs
1//!
2//! A rust implementation of the WSP space filling algorithm.
3//!
4//! This is based on the paper from J. Santiago _et al_:
5//! ```raw
6//! [1] Santiago, J., Claeys-Bruno, M., & Sergent, M. (2012). Construction of space-filling designs using WSP algorithm for high dimensional spaces. Chemometrics and Intelligent Laboratory Systems, 113, 26-31.
7//! ```
8//!
9//! ## Usage
10//!
11//! Add the following line to the `Cargo.toml` file:
12//! ```toml
13//! [dependencies]
14//! wsp = "0.1.6"
15//! ```
16//! ## Use cases
17//!
18//! A space-filling algorithm enables to remove points to close to each other in a given space. Given a minimal distance `d_min` and an initial set of points, wsp builds a subset where all remaining points are at least `d_min` distant from each other.
19//!
20//! wsp also provides an alternative version of the classical WSP algorithm. Certain scenarios do not have any clue about the `d_min` to choose, but require a given number of remaining points in the subset. `adaptive_wsp` search the best `d_min` to create a subset of a target number of points.
21//!
22//! ## Example
23//!
24//! ### WSP
25//!
26//! The following example generates an initial set of 1000 points from a uniform random distribution, in a 20-dimensions space. The generation uses the seed 51. The minimal distance is arbitrarily set to 3.0.
27//!
28//! ```rust
29//! // Generates the initial set
30//! let mut points = wsp::PointSet::init_from_random(1000, 20, 51);
31//!
32//! // Only keep distant enough points
33//! let d_min = 3.0;
34//! wsp::wsp(&mut points, d_min);
35//!
36//! // Iterate over the remaining points
37//! for valid_point in points.get_remaining() {
38//! println!("{:?}", valid_point);
39//! }
40//! ```
41//!
42//! ### Adaptive WSP
43//!
44//! The next example uses the `adaptive_wsp` function with verbose mode. The initial set is similar to the previous example thanks to the seed. We aim to find a minimal distance such that the resulting set only contains 100 points.
45//!
46//! ```rust
47//! // Generates the initial set
48//! let mut points = wsp::PointSet::init_from_random(1000, 20, 51);
49//!
50//! // Adaptive WSP makes a binary search to reach the target
51//! // number of remaining points
52//! let objective_nb: usize = 100;
53//! wsp::adaptive_wsp(&mut points, objective_nb, false);
54//!
55//! // Save the result in a CSV file
56//! if let Err(err) = points.save_in_csv("wsp.csv", false) {
57//! println!("Error writing in CSV: {}", err);
58//! std::process::exit(1);
59//! }
60//! ```
61//!
62//! ## Binary usage
63
64//! Use `cargo install wsp` to install a binary version of the rust-wsp crate. Both `wsp()` and `adaptive_wsp` are available through the command line.
65//!
66//! ### Classic WSP
67//!
68//! You may run the classic WSP version with
69//!
70//! ```bash
71//! wsp -n 1000 -m 20 -d 0.5
72//! ```
73//!
74//! This will run the WSP algorithm with 1000 initial points. Each point has a dimension of 10. The minimal distance between each point, as detailed in the paper, is set to 0.5. For now, the algorithm uses the l1 (Manhattan) distance, as it provides better separation in high dimensional space than the l2 (euclidian) distance. The result is stored in a file named `wsp.csv`. Each row represents a point in a 20 dimensions space that is far enough from its other neighbours. You may change the output file with the `-o` option.
75//!
76//! ### Adaptive WSP
77//!
78//! ```bash
79//! $ wsp -n 1000 -m 20 --adaptive 100
80//! ```
81//!
82//! Similarly to the previous command, the initial space if filled with 1000 points of 20 dimensions. However, now, the user does not need to specify the minimal distance between points, but instead the _desired_ number of points in the resulting set. The algorithm will perform a binary search on the minimal distance until 1) the resulting set contains the desired number of points or 2) there is not distance that can be found to reach this quantity. In the second scenario, rust-wsp uses the minimal distance resulting in a space where the number of points is _as close as possible_ to the desired value.
83//!
84//! Consider the example below, where we want 200 points in a space of 20 dimensions, initially filled with 1000 points:
85//! ```bash
86//! $ wsp -n 1000 -m 20 --adaptive 200 -v
87//! Iter #1: distance=7.430897367982144, nb_active=5
88//! Iter #2: distance=5.028120018334874, nb_active=98
89//! Iter #3: distance=3.826731343511239, nb_active=591
90//! Iter #4: distance=4.427425680923057, nb_active=270
91//! Iter #5: distance=4.727772849628966, nb_active=168
92//! Iter #6: distance=4.577599265276012, nb_active=209
93//! Iter #7: distance=4.652686057452489, nb_active=170
94//! Iter #8: distance=4.615142661364251, nb_active=195
95//! Iter #9: distance=4.5963709633201315, nb_active=207
96//! Iter #10: distance=4.605756812342191, nb_active=194
97//! Iter #11: distance=4.601063887831161, nb_active=191
98//! Iter #12: distance=4.5987174255756464, nb_active=206
99//! Iter #13: distance=4.599890656703404, nb_active=206
100//! Iter #14: distance=4.600477272267282, nb_active=201
101//! Iter #15: distance=4.600770580049222, nb_active=194
102//! Iter #16: distance=4.600623926158252, nb_active=194
103//! Iter #17: distance=4.600550599212767, nb_active=197
104//! [...]
105//! Iter #53: distance=4.60049404718639, nb_active=201
106//! Iter #54: distance=4.600494047186391, nb_active=197
107//! Last iter: best approximation is distance=4.600477272267282, nb_active=201
108//! Nb active: 201
109//! ```
110//!
111//! The algorithm performs 54 iterations until the minimal distance search space is completely explored. It will recompute the space (if needed) qith the minimal distance resulting in the best approximation of the target number of active points in the set. Here, it is 201, with an error of 1 compared to the objective. The resulting matrix is also stored in a file named `wsp.csv` by default.
112//!
113//! ### More help
114//!
115//! Run `wsp -h` or `wsp --help` for more information about the arguments.
116
117use rand::rngs::SmallRng;
118use rand::{Rng, SeedableRng};
119use serde::Serialize;
120use std::cmp::Ordering;
121use std::error::Error;
122
123#[derive(Debug, Serialize)]
124struct Record {
125 point: Vec<f64>,
126}
127
128/// Internal representation of the WSP algorithm values.
129/// It is needed for the computation and to store information about the resulting point set.
130pub struct PointSet {
131 /// Points of the initial set
132 pub points: Vec<Vec<f64>>,
133 /// All ditances between all points
134 pub distance_matrix: Vec<Vec<f64>>,
135 /// If true, the point is still in the set. Otherwise, the point is considered as removed of the point set.
136 /// The user MUST only consider points with 'true' values as the only points in the resulting set
137 pub active: Vec<bool>,
138 /// Number of active points in the set
139 pub nb_active: usize,
140 /// For each point, the idx sorted increasingly with distance
141 /// to improve performance
142 idx_sort: Vec<Vec<usize>>,
143 /// For each point, the idx in the idx_sort of the closest active point
144 idx_active: Vec<usize>,
145 /// Visited point to avoid looping over the same point several times => ensures that we clear all the space
146 visited: Vec<bool>,
147 /// Minimal distance between points in the point set
148 d_min: f64,
149 /// Maximal distance between points in the point set
150 d_max: f64,
151}
152
153impl PointSet {
154 /// Creates a 'PointSet' from an already initialised vector of points.
155 ///
156 /// # Arguments
157 ///
158 /// * `points` - The pre-initialised set of points.
159 ///
160 /// # Example
161 ///
162 /// ```
163 /// let points: Vec<Vec<f64>> = vec![vec![1.0, 0.0, 1.0], vec![0.5, 0.5, 0.5]];
164 /// let poinset = wsp::PointSet::init_from_preset(points); // Give ownership
165 /// ```
166 pub fn init_from_preset(points: Vec<Vec<f64>>) -> PointSet {
167 // First compute the distance matrix, then move "points" to the
168 // output structure
169 let (distance_matrix, d_min, d_max) = PointSet::compute_distance_matrix(&points, None);
170
171 let mut p = PointSet {
172 distance_matrix,
173 active: vec![true; points.len()],
174 nb_active: points.len(),
175 idx_sort: Vec::with_capacity(points.len()),
176 // Start at 1 because closest is itself
177 idx_active: vec![1; points.len()],
178 visited: vec![false; points.len()],
179 points,
180 d_max,
181 d_min,
182 };
183 p.compute_closest_idx();
184 p
185 }
186
187 /// Creates a 'PointSet' using a random initialisation of the points following a uniform distribution.
188 ///
189 /// # Arguments
190 ///
191 /// * `nb_points` - The number of points in the set before running WSP.
192 /// * `nb_dim` - The dimension of the points.
193 /// * `seed` - The seed used for the uniform sampling of the coordinates of the points.
194 ///
195 /// # Example
196 ///
197 /// The following code snippet creates a PointSet with 100 points of dimension 10. The seed 51 is used for the
198 /// generation of the points.
199 /// ```
200 /// let poinset = wsp::PointSet::init_from_random(100, 10, 51); // Give ownership
201 /// ```
202 pub fn init_from_random(nb_points: usize, nb_dim: usize, seed: u64) -> PointSet {
203 let mut points: Vec<Vec<f64>> = Vec::with_capacity(nb_points);
204
205 let mut rng = SmallRng::seed_from_u64(seed);
206
207 // Generate random points
208 for _ in 0..nb_points {
209 let mut point: Vec<f64> = Vec::with_capacity(nb_dim);
210 for _ in 0..nb_dim {
211 point.push(rng.gen::<f64>());
212 }
213 points.push(point);
214 }
215
216 PointSet::init_from_preset(points)
217 }
218
219 fn reset_reseach_params(&mut self) {
220 self.nb_active = self.points.len();
221 self.active = vec![true; self.nb_active];
222 self.idx_active = vec![1; self.nb_active];
223 self.visited = vec![false; self.nb_active];
224 }
225
226 fn compute_closest_idx(&mut self) {
227 for i in 0..self.nb_active {
228 let mut idxs: Vec<usize> = (0..self.nb_active).collect();
229 idxs.sort_by(|&a, &b| {
230 self.distance_matrix[i][a]
231 .partial_cmp(&self.distance_matrix[i][b])
232 .unwrap()
233 });
234 self.idx_sort.push(idxs);
235 }
236 }
237
238 fn compute_distance_matrix(
239 points: &[Vec<f64>],
240 distance_algo: Option<&dyn Fn(&[f64], &[f64]) -> f64>,
241 ) -> (Vec<Vec<f64>>, f64, f64) {
242 let nb_points = points.len();
243 let mut distance_matrix = vec![vec![0.0f64; nb_points]; nb_points];
244 let mut dmin: f64 = f64::MAX;
245 let mut dmax: f64 = 0.0;
246 for i in 0..nb_points {
247 for j in i + 1..nb_points {
248 distance_matrix[i][j] = match distance_algo {
249 Some(algo) => algo(&points[i], &points[j]),
250 None => manhattan_distance(&points[i], &points[j]),
251 };
252
253 distance_matrix[j][i] = distance_matrix[i][j]; // Primitive type copy
254 dmin = dmin.min(distance_matrix[i][j]);
255 dmax = dmax.max(distance_matrix[i][j]);
256 }
257 }
258 (distance_matrix, dmin, dmax)
259 }
260
261 /// Stores a PointSet in a CSV file. This will store in a matrix form the active points in the PointSet.
262 /// Each row represents an active point, and each column a dimension in the space.
263 ///
264 /// # Arguments
265 ///
266 /// * `filepath` - The path to the file where to store the PointSet points.
267 /// * `transpose` - Transpose the matrix in the output CSV
268 ///
269 /// # Example
270 ///
271 /// The following code snippet stores the PointSet *before* running WSP
272 /// ```
273 /// let points = wsp::PointSet::init_from_random(100, 10, 51);
274 ///
275 /// if let Err(err) = points.save_in_csv("wsp.csv", false) {
276 /// eprintln!("Error writing in CSV: {}", err);
277 /// std::process::exit(1);
278 /// }
279 /// ```
280 pub fn save_in_csv(&self, filepath: &str, transpose: bool) -> Result<(), Box<dyn Error>> {
281 let mut wrt = csv::WriterBuilder::new()
282 .has_headers(false)
283 .from_path(filepath)?;
284
285 // Use star notation just to show that we understand it
286 let points = if transpose {
287 let mut transposed: Vec<Vec<f64>> =
288 vec![Vec::with_capacity(self.nb_active); self.points[0].len()];
289 for (i, point) in (*self.points).iter().enumerate() {
290 if self.active[i] {
291 for i in 0..point.len() {
292 transposed[i].push(point[i]);
293 }
294 }
295 }
296 transposed
297 } else {
298 self.points
299 .iter()
300 .enumerate()
301 .filter(|(i, _)| self.active[*i])
302 .to_owned()
303 .unzip::<usize, &Vec<f64>, Vec<usize>, Vec<&Vec<f64>>>()
304 .1
305 .iter()
306 .map(|x| x.to_owned().to_owned())
307 .collect()
308 };
309 for point in points.iter() {
310 wrt.serialize(Record {
311 point: point.clone(),
312 })?;
313 }
314 Ok(())
315 }
316
317 /// Returns a new vector containing only the active points of the PointSet.
318 ///
319 /// # Example
320 ///
321 /// The following code snippet stores the PointSet *before* running WSP
322 /// ```
323 /// // Generates the initial set
324 /// let mut points = wsp::PointSet::init_from_random(1000, 20, 51);
325 ///
326 /// // Only keep distant enough points
327 /// let d_min = 3.0;
328 /// wsp::wsp(&mut points, d_min);
329 ///
330 /// // Iterate over the remaining points
331 /// for valid_point in points.get_remaining() {
332 /// println!("{:?}", valid_point);
333 /// }
334 /// ```
335 pub fn get_remaining(&self) -> Vec<Vec<f64>> {
336 let mut points: Vec<Vec<f64>> = Vec::with_capacity(self.nb_active);
337 for i in 0..self.points.len() {
338 if self.active[i] {
339 points.push(self.points[i].clone());
340 }
341 }
342 points
343 }
344}
345
346fn _distance_sq(p1: &[f64], p2: &[f64]) -> f64 {
347 let mut dist: f64 = 0.0;
348 for i in 0..p1.len() {
349 dist += (p1[i] - p2[i]) * (p1[i] - p2[i]);
350 }
351 dist
352}
353
354fn manhattan_distance(p1: &[f64], p2: &[f64]) -> f64 {
355 p1.iter()
356 .zip(p2.iter())
357 .fold(0.0, |dist, (d1, d2)| dist + (d1 - d2).abs())
358}
359
360fn wsp_loop_fast(set: &mut PointSet, d_min: f64, mut origin: usize) {
361 loop {
362 let idxs_this_origin = &mut set.idx_sort[origin];
363
364 // Iterate over all "active" points closest to the current origin
365 // We may iterate over inactive points due to previous loop
366 // We stop iterating once we find the next closest point
367 // that is 1) active and 2) at a higher distance than *d_min*
368 let mut closest_origin = set.idx_active[origin];
369 set.visited[origin] = true;
370 loop {
371 if closest_origin >= set.points.len() {
372 return;
373 }
374 let point_idx = idxs_this_origin[closest_origin];
375 if !set.active[point_idx] {
376 // Not active point
377 closest_origin += 1;
378 continue;
379 } else if set.distance_matrix[origin][point_idx] < d_min {
380 // Point too close to the origin => kill
381 set.active[point_idx] = false;
382 set.nb_active -= 1;
383 closest_origin += 1;
384 } else if set.visited[point_idx] {
385 closest_origin += 1;
386 } else {
387 // Closest active point remaining is far enough from the origin
388 // Stop the loop and this point is the next origin
389 // Update the closest_origin of the current origin just in case
390 set.idx_active[origin] = closest_origin;
391 origin = idxs_this_origin[closest_origin];
392 break; // Further points will always be at a higher distance
393 }
394 }
395 }
396}
397/// Returns a new vector containing only the active points of the PointSet.
398///
399/// # Example
400///
401/// The following code snippet stores the PointSet *before* running WSP
402/// ```
403/// // Generates the initial set
404/// let mut points = wsp::PointSet::init_from_random(1000, 20, 51);
405///
406/// // Only keep distant enough points
407/// let d_min = 3.0;
408/// wsp::wsp(&mut points, d_min);
409///
410/// // Iterate over the remaining points
411/// for valid_point in points.get_remaining() {
412/// println!("{:?}", valid_point);
413/// }
414/// ```
415/// Executes the WSP space filling algorithm according to the paper.
416/// (Pseudo-)randomly chooses an origin, and removes all points too close to it
417/// according to the d_min value of the PointSet structure.
418/// Then, the new origin is the closest valid point from the old origin.
419/// The algorithm iterates like this until all points have been visited or removed.
420///
421/// # Arguments
422///
423/// * `set` - The PointSet instance. `set` is mutably borrowed.
424/// * `d_min` - The desired minimal distance between all remaining points in the PointSet.
425///
426/// # Example
427///
428/// ```
429/// let mut points = wsp::PointSet::init_from_random(1000, 20, 51);
430/// let d_min = 3.0;
431/// wsp::wsp(&mut points, d_min);
432/// ```
433pub fn wsp(set: &mut PointSet, d_min: f64) {
434 // Step 3: chose random point
435 let mut rng = SmallRng::seed_from_u64(10);
436 let origin: usize = rng.gen::<usize>() % set.points.len();
437
438 // Step 4, 5, 6: call specific algorithm for speed
439 wsp_loop_fast(set, d_min, origin);
440}
441
442/// This is an adaptive version of the WSP algorithm.
443/// The traditional algorithm requires a d_min and
444/// based on that we obtain a set of a given number of points.
445/// Here we adaptively change d_min to get (an approximation of)
446/// the desired number of points active after the algorithm.
447///
448/// # Arguments
449///
450/// * `set` - The PointSet instance. `set` is mutably borrowed.
451/// * `obj_nb` - The desired number of points remaining active in the set after the algorithm.
452/// * `verbose` - Print running information about the iterations of the algorithm.
453///
454/// # Example
455///
456/// ```
457/// let mut points = wsp::PointSet::init_from_random(1000, 20, 51);
458/// let objective_nb: usize = 100;
459/// wsp::adaptive_wsp(&mut points, objective_nb, false);
460/// ```
461pub fn adaptive_wsp(set: &mut PointSet, obj_nb: usize, verbose: bool) {
462 let mut d_min = set.d_min;
463 let mut d_max = set.d_max;
464 let mut d_search = (d_min + d_max) / 2.0;
465 let mut iter = 0;
466 let mut best_distance = 0.0;
467 let mut best_difference_active = set.nb_active - obj_nb;
468 loop {
469 iter += 1;
470 wsp(set, d_search);
471
472 // Binary search the best d_min
473 if verbose {
474 println!(
475 "Iter #{}: distance={}, nb_active={}",
476 iter, d_search, set.nb_active
477 );
478 }
479 match set.nb_active.cmp(&obj_nb) {
480 Ordering::Greater => d_min = d_search,
481 Ordering::Less => d_max = d_search,
482 Ordering::Equal => return,
483 };
484
485 // The search space is not continuous.
486 // We must also track the best result to recover it afterwards
487 if (set.nb_active as i32 - obj_nb as i32).abs() < best_difference_active as i32 {
488 best_difference_active = (set.nb_active as i32 - obj_nb as i32).abs() as usize;
489 best_distance = d_search;
490 }
491
492 // Stop condition if we cannot exactly reach the target number
493 let last_d_search = d_search;
494 d_search = (d_min + d_max) / 2.0;
495 if (last_d_search - d_search).abs() <= f64::EPSILON {
496 break;
497 }
498
499 // Reset parameters for the next iteration
500 set.reset_reseach_params();
501 }
502
503 // Recompute a last time if the best distance is not the last computed distance
504 if (best_distance - d_search).abs() > f64::EPSILON {
505 d_search = best_distance;
506 set.reset_reseach_params();
507 wsp(set, d_search);
508 }
509 if verbose {
510 println!(
511 "Last iter: best approximation is distance={}, nb_active={}",
512 d_search, set.nb_active
513 );
514 }
515}
516
517#[cfg(test)]
518mod tests {
519 use super::*;
520 #[test]
521 fn test_distance_sq() {
522 let mut p1: Vec<f64> = vec![1.0, 0.0];
523 let mut p2 = vec![0.0, 0.0];
524 assert_eq!(_distance_sq(&p1, &p2), 1.0);
525
526 p1 = vec![2.0, 2.0];
527 p2 = vec![2.0, 9.0];
528 assert_eq!(_distance_sq(&p1, &p2), 49.0);
529 }
530
531 #[test]
532 fn test_manhattan_distance() {
533 let p1 = vec![0.0, 0.0, 0.0];
534 let p2 = vec![0.5, 0.5, 1.0];
535 let p3 = vec![1.0, 0.0, 0.5];
536 assert_eq!(manhattan_distance(&p1, &p2), 2.0);
537 assert_eq!(manhattan_distance(&p1, &p3), 1.5);
538 assert_eq!(manhattan_distance(&p2, &p3), 1.5);
539 assert_eq!(manhattan_distance(&p1, &p1), 0.0);
540 }
541
542 #[test]
543 fn test_distance_matrix() {
544 let p1 = vec![0.0, 0.0];
545 let p2 = vec![4.0, 0.0];
546 let p3 = vec![4.0, 3.0];
547 let (distance_matrix, d_min, d_max) =
548 PointSet::compute_distance_matrix(&vec![p1, p2, p3], Some(&_distance_sq));
549
550 let true_distance = vec![
551 vec![0.0, 16.0, 25.0],
552 vec![16.0, 0.0, 9.0],
553 vec![25.0, 9.0, 0.0],
554 ];
555
556 for i in 0..3 {
557 for j in 0..3 {
558 assert_eq!(distance_matrix[i][j], true_distance[i][j]);
559 }
560 }
561
562 assert_eq!(d_min, 9.0);
563 assert_eq!(d_max, 25.0);
564 }
565
566 #[test]
567 fn test_closest_idx() {
568 let p1 = vec![0.0, 0.0];
569 let p2 = vec![1.0, 0.1];
570 let p3 = vec![1.0, 1.0];
571 let p4 = vec![2.0, 1.0];
572 let pointset = PointSet::init_from_preset(vec![p1, p2, p3, p4]);
573
574 let true_idxs = vec![
575 vec![0, 1, 2, 3],
576 vec![1, 2, 0, 3],
577 vec![2, 1, 3, 0],
578 vec![3, 2, 1, 0],
579 ];
580
581 for i in 0..4 {
582 for j in 0..4 {
583 assert_eq!(pointset.idx_sort[i][j], true_idxs[i][j]);
584 }
585 }
586 }
587
588 #[test]
589 fn test_iterative_fast_1() {
590 let p1 = vec![0.0, 0.0];
591 let p2 = vec![1.0, 0.1];
592 let p3 = vec![1.0, 1.0];
593 let p4 = vec![2.0, 1.0];
594 let mut pointset = PointSet::init_from_preset(vec![p1, p2, p3, p4]);
595
596 wsp_loop_fast(&mut pointset, 1.0, 1);
597
598 // The expected behaviour is
599 // 1) * p3 too close => becomes inactive
600 // * p1 far enough => becomes new origin
601 // 2) * p1 far from p2 => p2 becomes origin
602 // * no change in the set => stop iteration
603 assert_eq!(pointset.active[0], true);
604 assert_eq!(pointset.active[1], true);
605 assert_eq!(pointset.active[2], false);
606 assert_eq!(pointset.active[3], true);
607
608 assert_eq!(pointset.nb_active, 3);
609 }
610
611 #[test]
612 fn test_all_points_visited() {
613 let d_min: f64 = 0.04;
614 let mut points = PointSet::init_from_random(1000, 3, 51);
615 wsp(&mut points, d_min);
616
617 // All points are either visited or inactive
618 for i in 0..1000 {
619 assert!(points.visited[i] || !points.active[i]);
620 }
621 }
622
623 #[test]
624 fn test_min_dist_ok() {
625 let d_min: f64 = 0.04;
626 let mut points = PointSet::init_from_random(1000, 3, 51);
627 wsp(&mut points, d_min);
628
629 // All active points have a distance higher or equal to d_min
630 for i in 0..999 {
631 if !points.active[i] {
632 continue;
633 }
634 for j in i + 1..1000 {
635 if !points.active[j] {
636 continue;
637 }
638 assert!(points.distance_matrix[i][j] >= d_min);
639 }
640 }
641 }
642}