wsp/
lib.rs

1//!
2//! A rust implementation of the WSP space filling algorithm.
3//!
4//! This is based on the paper from J. Santiago _et al_:
5//! ```raw
6//! [1] Santiago, J., Claeys-Bruno, M., & Sergent, M. (2012). Construction of space-filling designs using WSP algorithm for high dimensional spaces. Chemometrics and Intelligent Laboratory Systems, 113, 26-31.
7//! ```
8//!
9//! ## Usage
10//!
11//! Add the following line to the `Cargo.toml` file:
12//! ```toml
13//! [dependencies]
14//! wsp = "0.1.6"
15//! ```
16//! ## Use cases
17//!
18//! A space-filling algorithm enables to remove points to close to each other in a given space. Given a minimal distance `d_min` and an initial set of points, wsp builds a subset where all remaining points are at least `d_min` distant from each other.
19//!
20//! wsp also provides an alternative version of the classical WSP algorithm. Certain scenarios do not have any clue about the `d_min` to choose, but require a given number of remaining points in the subset. `adaptive_wsp` search the best `d_min` to create a subset of a target number of points.
21//!
22//! ## Example
23//!
24//! ### WSP
25//!
26//! The following example generates an initial set of 1000 points from a uniform random distribution, in a 20-dimensions space. The generation uses the seed 51. The minimal distance is arbitrarily set to 3.0.
27//!
28//! ```rust
29//! // Generates the initial set
30//! let mut points = wsp::PointSet::init_from_random(1000, 20, 51);
31//!
32//! // Only keep distant enough points
33//! let d_min = 3.0;
34//! wsp::wsp(&mut points, d_min);
35//!
36//! // Iterate over the remaining points
37//! for valid_point in points.get_remaining() {
38//!     println!("{:?}", valid_point);
39//! }
40//! ```
41//!
42//! ### Adaptive WSP
43//!
44//! The next example uses the `adaptive_wsp` function with verbose mode. The initial set is similar to the previous example thanks to the seed. We aim to find a minimal distance such that the resulting set only contains 100 points.
45//!
46//! ```rust
47//! // Generates the initial set
48//! let mut points = wsp::PointSet::init_from_random(1000, 20, 51);
49//!
50//! // Adaptive WSP makes a binary search to reach the target
51//! // number of remaining points
52//! let objective_nb: usize = 100;
53//! wsp::adaptive_wsp(&mut points, objective_nb, false);
54//!
55//! // Save the result in a CSV file
56//! if let Err(err) = points.save_in_csv("wsp.csv", false) {
57//!     println!("Error writing in CSV: {}", err);
58//!     std::process::exit(1);
59//! }
60//! ```
61//!
62//! ## Binary usage
63
64//! Use `cargo install wsp` to install a binary version of the rust-wsp crate. Both `wsp()` and `adaptive_wsp` are available through the command line.
65//!
66//! ### Classic WSP
67//!
68//! You may run the classic WSP version with
69//!
70//! ```bash
71//! wsp -n 1000 -m 20 -d 0.5
72//! ```
73//!
74//! This will run the WSP algorithm with 1000 initial points. Each point has a dimension of 10. The minimal distance between each point, as detailed in the paper, is set to 0.5. For now, the algorithm uses the l1 (Manhattan) distance, as it provides better separation in high dimensional space than the l2 (euclidian) distance. The result is stored in a file named `wsp.csv`. Each row represents a point in a 20 dimensions space that is far enough from its other neighbours. You may change the output file with the `-o` option.
75//!
76//! ### Adaptive WSP
77//!
78//! ```bash
79//! $ wsp -n 1000 -m 20 --adaptive 100
80//! ```
81//!
82//! Similarly to the previous command, the initial space if filled with 1000 points of 20 dimensions. However, now, the user does not need to specify the minimal distance between points, but instead the _desired_ number of points in the resulting set. The algorithm will perform a binary search on the minimal distance until 1) the resulting set contains the desired number of points or 2) there is not distance that can be found to reach this quantity. In the second scenario, rust-wsp uses the minimal distance resulting in a space where the number of points is _as close as possible_ to the desired value.
83//!
84//! Consider the example below, where we want 200 points in a space of 20 dimensions, initially filled with 1000 points:
85//! ```bash
86//! $ wsp -n 1000 -m 20 --adaptive 200 -v
87//! Iter #1: distance=7.430897367982144, nb_active=5
88//! Iter #2: distance=5.028120018334874, nb_active=98
89//! Iter #3: distance=3.826731343511239, nb_active=591
90//! Iter #4: distance=4.427425680923057, nb_active=270
91//! Iter #5: distance=4.727772849628966, nb_active=168
92//! Iter #6: distance=4.577599265276012, nb_active=209
93//! Iter #7: distance=4.652686057452489, nb_active=170
94//! Iter #8: distance=4.615142661364251, nb_active=195
95//! Iter #9: distance=4.5963709633201315, nb_active=207
96//! Iter #10: distance=4.605756812342191, nb_active=194
97//! Iter #11: distance=4.601063887831161, nb_active=191
98//! Iter #12: distance=4.5987174255756464, nb_active=206
99//! Iter #13: distance=4.599890656703404, nb_active=206
100//! Iter #14: distance=4.600477272267282, nb_active=201
101//! Iter #15: distance=4.600770580049222, nb_active=194
102//! Iter #16: distance=4.600623926158252, nb_active=194
103//! Iter #17: distance=4.600550599212767, nb_active=197
104//! [...]
105//! Iter #53: distance=4.60049404718639, nb_active=201
106//! Iter #54: distance=4.600494047186391, nb_active=197
107//! Last iter: best approximation is distance=4.600477272267282, nb_active=201
108//! Nb active: 201
109//! ```
110//!
111//! The algorithm performs 54 iterations until the minimal distance search space is completely explored. It will recompute the space (if needed) qith the minimal distance resulting in the best approximation of the target number of active points in the set. Here, it is 201, with an error of 1 compared to the objective. The resulting matrix is also stored in a file named `wsp.csv` by default.
112//!
113//! ### More help
114//!
115//! Run `wsp -h` or `wsp --help` for more information about the arguments.
116
117use rand::rngs::SmallRng;
118use rand::{Rng, SeedableRng};
119use serde::Serialize;
120use std::cmp::Ordering;
121use std::error::Error;
122
123#[derive(Debug, Serialize)]
124struct Record {
125    point: Vec<f64>,
126}
127
128/// Internal representation of the WSP algorithm values.
129/// It is needed for the computation and to store information about the resulting point set.
130pub struct PointSet {
131    /// Points of the initial set
132    pub points: Vec<Vec<f64>>,
133    /// All ditances between all points
134    pub distance_matrix: Vec<Vec<f64>>,
135    /// If true, the point is still in the set. Otherwise, the point is considered as removed of the point set.
136    /// The user MUST only consider points with 'true' values as the only points in the resulting set
137    pub active: Vec<bool>,
138    /// Number of active points in the set
139    pub nb_active: usize,
140    /// For each point, the idx sorted increasingly with distance
141    /// to improve performance
142    idx_sort: Vec<Vec<usize>>,
143    /// For each point, the idx in the idx_sort of the closest active point
144    idx_active: Vec<usize>,
145    /// Visited point to avoid looping over the same point several times => ensures that we clear all the space
146    visited: Vec<bool>,
147    /// Minimal distance between points in the point set
148    d_min: f64,
149    /// Maximal distance between points in the point set
150    d_max: f64,
151}
152
153impl PointSet {
154    /// Creates a 'PointSet' from an already initialised vector of points.
155    ///
156    /// # Arguments
157    ///
158    /// * `points` - The pre-initialised set of points.
159    ///
160    /// # Example
161    ///
162    /// ```
163    /// let points: Vec<Vec<f64>> = vec![vec![1.0, 0.0, 1.0], vec![0.5, 0.5, 0.5]];
164    /// let poinset = wsp::PointSet::init_from_preset(points); // Give ownership
165    /// ```
166    pub fn init_from_preset(points: Vec<Vec<f64>>) -> PointSet {
167        // First compute the distance matrix, then move "points" to the
168        // output structure
169        let (distance_matrix, d_min, d_max) = PointSet::compute_distance_matrix(&points, None);
170
171        let mut p = PointSet {
172            distance_matrix,
173            active: vec![true; points.len()],
174            nb_active: points.len(),
175            idx_sort: Vec::with_capacity(points.len()),
176            // Start at 1 because closest is itself
177            idx_active: vec![1; points.len()],
178            visited: vec![false; points.len()],
179            points,
180            d_max,
181            d_min,
182        };
183        p.compute_closest_idx();
184        p
185    }
186
187    /// Creates a 'PointSet' using a random initialisation of the points following a uniform distribution.
188    ///
189    /// # Arguments
190    ///
191    /// * `nb_points` - The number of points in the set before running WSP.
192    /// * `nb_dim` - The dimension of the points.
193    /// * `seed` - The seed used for the uniform sampling of the coordinates of the points.
194    ///
195    /// # Example
196    ///
197    /// The following code snippet creates a PointSet with 100 points of dimension 10. The seed 51 is used for the
198    /// generation of the points.
199    /// ```
200    /// let poinset = wsp::PointSet::init_from_random(100, 10, 51); // Give ownership
201    /// ```
202    pub fn init_from_random(nb_points: usize, nb_dim: usize, seed: u64) -> PointSet {
203        let mut points: Vec<Vec<f64>> = Vec::with_capacity(nb_points);
204
205        let mut rng = SmallRng::seed_from_u64(seed);
206
207        // Generate random points
208        for _ in 0..nb_points {
209            let mut point: Vec<f64> = Vec::with_capacity(nb_dim);
210            for _ in 0..nb_dim {
211                point.push(rng.gen::<f64>());
212            }
213            points.push(point);
214        }
215
216        PointSet::init_from_preset(points)
217    }
218
219    fn reset_reseach_params(&mut self) {
220        self.nb_active = self.points.len();
221        self.active = vec![true; self.nb_active];
222        self.idx_active = vec![1; self.nb_active];
223        self.visited = vec![false; self.nb_active];
224    }
225
226    fn compute_closest_idx(&mut self) {
227        for i in 0..self.nb_active {
228            let mut idxs: Vec<usize> = (0..self.nb_active).collect();
229            idxs.sort_by(|&a, &b| {
230                self.distance_matrix[i][a]
231                    .partial_cmp(&self.distance_matrix[i][b])
232                    .unwrap()
233            });
234            self.idx_sort.push(idxs);
235        }
236    }
237
238    fn compute_distance_matrix(
239        points: &[Vec<f64>],
240        distance_algo: Option<&dyn Fn(&[f64], &[f64]) -> f64>,
241    ) -> (Vec<Vec<f64>>, f64, f64) {
242        let nb_points = points.len();
243        let mut distance_matrix = vec![vec![0.0f64; nb_points]; nb_points];
244        let mut dmin: f64 = f64::MAX;
245        let mut dmax: f64 = 0.0;
246        for i in 0..nb_points {
247            for j in i + 1..nb_points {
248                distance_matrix[i][j] = match distance_algo {
249                    Some(algo) => algo(&points[i], &points[j]),
250                    None => manhattan_distance(&points[i], &points[j]),
251                };
252
253                distance_matrix[j][i] = distance_matrix[i][j]; // Primitive type copy
254                dmin = dmin.min(distance_matrix[i][j]);
255                dmax = dmax.max(distance_matrix[i][j]);
256            }
257        }
258        (distance_matrix, dmin, dmax)
259    }
260
261    /// Stores a PointSet in a CSV file. This will store in a matrix form the active points in the PointSet.
262    /// Each row represents an active point, and each column a dimension in the space.
263    ///
264    /// # Arguments
265    ///
266    /// * `filepath` - The path to the file where to store the PointSet points.
267    /// * `transpose` - Transpose the matrix in the output CSV
268    ///
269    /// # Example
270    ///
271    /// The following code snippet stores the PointSet *before* running WSP
272    /// ```
273    /// let points = wsp::PointSet::init_from_random(100, 10, 51);
274    ///
275    /// if let Err(err) = points.save_in_csv("wsp.csv", false) {
276    ///     eprintln!("Error writing in CSV: {}", err);
277    ///     std::process::exit(1);
278    /// }
279    /// ```
280    pub fn save_in_csv(&self, filepath: &str, transpose: bool) -> Result<(), Box<dyn Error>> {
281        let mut wrt = csv::WriterBuilder::new()
282            .has_headers(false)
283            .from_path(filepath)?;
284
285        // Use star notation just to show that we understand it
286        let points = if transpose {
287            let mut transposed: Vec<Vec<f64>> =
288                vec![Vec::with_capacity(self.nb_active); self.points[0].len()];
289            for (i, point) in (*self.points).iter().enumerate() {
290                if self.active[i] {
291                    for i in 0..point.len() {
292                        transposed[i].push(point[i]);
293                    }
294                }
295            }
296            transposed
297        } else {
298            self.points
299                .iter()
300                .enumerate()
301                .filter(|(i, _)| self.active[*i])
302                .to_owned()
303                .unzip::<usize, &Vec<f64>, Vec<usize>, Vec<&Vec<f64>>>()
304                .1
305                .iter()
306                .map(|x| x.to_owned().to_owned())
307                .collect()
308        };
309        for point in points.iter() {
310            wrt.serialize(Record {
311                point: point.clone(),
312            })?;
313        }
314        Ok(())
315    }
316
317    /// Returns a new vector containing only the active points of the PointSet.
318    ///
319    /// # Example
320    ///
321    /// The following code snippet stores the PointSet *before* running WSP
322    /// ```
323    /// // Generates the initial set
324    /// let mut points = wsp::PointSet::init_from_random(1000, 20, 51);
325    ///
326    /// // Only keep distant enough points
327    /// let d_min = 3.0;
328    /// wsp::wsp(&mut points, d_min);
329    ///
330    /// // Iterate over the remaining points
331    /// for valid_point in points.get_remaining() {
332    ///     println!("{:?}", valid_point);
333    /// }
334    /// ```
335    pub fn get_remaining(&self) -> Vec<Vec<f64>> {
336        let mut points: Vec<Vec<f64>> = Vec::with_capacity(self.nb_active);
337        for i in 0..self.points.len() {
338            if self.active[i] {
339                points.push(self.points[i].clone());
340            }
341        }
342        points
343    }
344}
345
346fn _distance_sq(p1: &[f64], p2: &[f64]) -> f64 {
347    let mut dist: f64 = 0.0;
348    for i in 0..p1.len() {
349        dist += (p1[i] - p2[i]) * (p1[i] - p2[i]);
350    }
351    dist
352}
353
354fn manhattan_distance(p1: &[f64], p2: &[f64]) -> f64 {
355    p1.iter()
356        .zip(p2.iter())
357        .fold(0.0, |dist, (d1, d2)| dist + (d1 - d2).abs())
358}
359
360fn wsp_loop_fast(set: &mut PointSet, d_min: f64, mut origin: usize) {
361    loop {
362        let idxs_this_origin = &mut set.idx_sort[origin];
363
364        // Iterate over all "active" points closest to the current origin
365        // We may iterate over inactive points due to previous loop
366        // We stop iterating once we find the next closest point
367        // that is 1) active and 2) at a higher distance than *d_min*
368        let mut closest_origin = set.idx_active[origin];
369        set.visited[origin] = true;
370        loop {
371            if closest_origin >= set.points.len() {
372                return;
373            }
374            let point_idx = idxs_this_origin[closest_origin];
375            if !set.active[point_idx] {
376                // Not active point
377                closest_origin += 1;
378                continue;
379            } else if set.distance_matrix[origin][point_idx] < d_min {
380                // Point too close to the origin => kill
381                set.active[point_idx] = false;
382                set.nb_active -= 1;
383                closest_origin += 1;
384            } else if set.visited[point_idx] {
385                closest_origin += 1;
386            } else {
387                // Closest active point remaining is far enough from the origin
388                // Stop the loop and this point is the next origin
389                // Update the closest_origin of the current origin just in case
390                set.idx_active[origin] = closest_origin;
391                origin = idxs_this_origin[closest_origin];
392                break; // Further points will always be at a higher distance
393            }
394        }
395    }
396}
397/// Returns a new vector containing only the active points of the PointSet.
398///
399/// # Example
400///
401/// The following code snippet stores the PointSet *before* running WSP
402/// ```
403/// // Generates the initial set
404/// let mut points = wsp::PointSet::init_from_random(1000, 20, 51);
405///
406/// // Only keep distant enough points
407/// let d_min = 3.0;
408/// wsp::wsp(&mut points, d_min);
409///
410/// // Iterate over the remaining points
411/// for valid_point in points.get_remaining() {
412///     println!("{:?}", valid_point);
413/// }
414/// ```
415/// Executes the WSP space filling algorithm according to the paper.
416/// (Pseudo-)randomly chooses an origin, and removes all points too close to it
417/// according to the d_min value of the PointSet structure.
418/// Then, the new origin is the closest valid point from the old origin.
419/// The algorithm iterates like this until all points have been visited or removed.
420///
421/// # Arguments
422///
423/// * `set` - The PointSet instance. `set` is mutably borrowed.
424/// * `d_min` - The desired minimal distance between all remaining points in the PointSet.
425///
426/// # Example
427///
428/// ```
429/// let mut points = wsp::PointSet::init_from_random(1000, 20, 51);
430/// let d_min = 3.0;
431/// wsp::wsp(&mut points, d_min);
432/// ```
433pub fn wsp(set: &mut PointSet, d_min: f64) {
434    // Step 3: chose random point
435    let mut rng = SmallRng::seed_from_u64(10);
436    let origin: usize = rng.gen::<usize>() % set.points.len();
437
438    // Step 4, 5, 6: call specific algorithm for speed
439    wsp_loop_fast(set, d_min, origin);
440}
441
442/// This is an adaptive version of the WSP algorithm.
443/// The traditional algorithm requires a d_min and
444/// based on that we obtain a set of a given number of points.
445/// Here we adaptively change d_min to get (an approximation of)
446/// the desired number of points active after the algorithm.
447///
448/// # Arguments
449///
450/// * `set` - The PointSet instance. `set` is mutably borrowed.
451/// * `obj_nb` - The desired number of points remaining active in the set after the algorithm.
452/// * `verbose` - Print running information about the iterations of the algorithm.
453///
454/// # Example
455///
456/// ```
457/// let mut points = wsp::PointSet::init_from_random(1000, 20, 51);
458/// let objective_nb: usize = 100;
459/// wsp::adaptive_wsp(&mut points, objective_nb, false);
460/// ```
461pub fn adaptive_wsp(set: &mut PointSet, obj_nb: usize, verbose: bool) {
462    let mut d_min = set.d_min;
463    let mut d_max = set.d_max;
464    let mut d_search = (d_min + d_max) / 2.0;
465    let mut iter = 0;
466    let mut best_distance = 0.0;
467    let mut best_difference_active = set.nb_active - obj_nb;
468    loop {
469        iter += 1;
470        wsp(set, d_search);
471
472        // Binary search the best d_min
473        if verbose {
474            println!(
475                "Iter #{}: distance={}, nb_active={}",
476                iter, d_search, set.nb_active
477            );
478        }
479        match set.nb_active.cmp(&obj_nb) {
480            Ordering::Greater => d_min = d_search,
481            Ordering::Less => d_max = d_search,
482            Ordering::Equal => return,
483        };
484
485        // The search space is not continuous.
486        // We must also track the best result to recover it afterwards
487        if (set.nb_active as i32 - obj_nb as i32).abs() < best_difference_active as i32 {
488            best_difference_active = (set.nb_active as i32 - obj_nb as i32).abs() as usize;
489            best_distance = d_search;
490        }
491
492        // Stop condition if we cannot exactly reach the target number
493        let last_d_search = d_search;
494        d_search = (d_min + d_max) / 2.0;
495        if (last_d_search - d_search).abs() <= f64::EPSILON {
496            break;
497        }
498
499        // Reset parameters for the next iteration
500        set.reset_reseach_params();
501    }
502
503    // Recompute a last time if the best distance is not the last computed distance
504    if (best_distance - d_search).abs() > f64::EPSILON {
505        d_search = best_distance;
506        set.reset_reseach_params();
507        wsp(set, d_search);
508    }
509    if verbose {
510        println!(
511            "Last iter: best approximation is distance={}, nb_active={}",
512            d_search, set.nb_active
513        );
514    }
515}
516
517#[cfg(test)]
518mod tests {
519    use super::*;
520    #[test]
521    fn test_distance_sq() {
522        let mut p1: Vec<f64> = vec![1.0, 0.0];
523        let mut p2 = vec![0.0, 0.0];
524        assert_eq!(_distance_sq(&p1, &p2), 1.0);
525
526        p1 = vec![2.0, 2.0];
527        p2 = vec![2.0, 9.0];
528        assert_eq!(_distance_sq(&p1, &p2), 49.0);
529    }
530
531    #[test]
532    fn test_manhattan_distance() {
533        let p1 = vec![0.0, 0.0, 0.0];
534        let p2 = vec![0.5, 0.5, 1.0];
535        let p3 = vec![1.0, 0.0, 0.5];
536        assert_eq!(manhattan_distance(&p1, &p2), 2.0);
537        assert_eq!(manhattan_distance(&p1, &p3), 1.5);
538        assert_eq!(manhattan_distance(&p2, &p3), 1.5);
539        assert_eq!(manhattan_distance(&p1, &p1), 0.0);
540    }
541
542    #[test]
543    fn test_distance_matrix() {
544        let p1 = vec![0.0, 0.0];
545        let p2 = vec![4.0, 0.0];
546        let p3 = vec![4.0, 3.0];
547        let (distance_matrix, d_min, d_max) =
548            PointSet::compute_distance_matrix(&vec![p1, p2, p3], Some(&_distance_sq));
549
550        let true_distance = vec![
551            vec![0.0, 16.0, 25.0],
552            vec![16.0, 0.0, 9.0],
553            vec![25.0, 9.0, 0.0],
554        ];
555
556        for i in 0..3 {
557            for j in 0..3 {
558                assert_eq!(distance_matrix[i][j], true_distance[i][j]);
559            }
560        }
561
562        assert_eq!(d_min, 9.0);
563        assert_eq!(d_max, 25.0);
564    }
565
566    #[test]
567    fn test_closest_idx() {
568        let p1 = vec![0.0, 0.0];
569        let p2 = vec![1.0, 0.1];
570        let p3 = vec![1.0, 1.0];
571        let p4 = vec![2.0, 1.0];
572        let pointset = PointSet::init_from_preset(vec![p1, p2, p3, p4]);
573
574        let true_idxs = vec![
575            vec![0, 1, 2, 3],
576            vec![1, 2, 0, 3],
577            vec![2, 1, 3, 0],
578            vec![3, 2, 1, 0],
579        ];
580
581        for i in 0..4 {
582            for j in 0..4 {
583                assert_eq!(pointset.idx_sort[i][j], true_idxs[i][j]);
584            }
585        }
586    }
587
588    #[test]
589    fn test_iterative_fast_1() {
590        let p1 = vec![0.0, 0.0];
591        let p2 = vec![1.0, 0.1];
592        let p3 = vec![1.0, 1.0];
593        let p4 = vec![2.0, 1.0];
594        let mut pointset = PointSet::init_from_preset(vec![p1, p2, p3, p4]);
595
596        wsp_loop_fast(&mut pointset, 1.0, 1);
597
598        // The expected behaviour is
599        // 1) * p3 too close => becomes inactive
600        //    * p1 far enough => becomes new origin
601        // 2) * p1 far from p2 => p2 becomes origin
602        //    * no change in the set => stop iteration
603        assert_eq!(pointset.active[0], true);
604        assert_eq!(pointset.active[1], true);
605        assert_eq!(pointset.active[2], false);
606        assert_eq!(pointset.active[3], true);
607
608        assert_eq!(pointset.nb_active, 3);
609    }
610
611    #[test]
612    fn test_all_points_visited() {
613        let d_min: f64 = 0.04;
614        let mut points = PointSet::init_from_random(1000, 3, 51);
615        wsp(&mut points, d_min);
616
617        // All points are either visited or inactive
618        for i in 0..1000 {
619            assert!(points.visited[i] || !points.active[i]);
620        }
621    }
622
623    #[test]
624    fn test_min_dist_ok() {
625        let d_min: f64 = 0.04;
626        let mut points = PointSet::init_from_random(1000, 3, 51);
627        wsp(&mut points, d_min);
628
629        // All active points have a distance higher or equal to d_min
630        for i in 0..999 {
631            if !points.active[i] {
632                continue;
633            }
634            for j in i + 1..1000 {
635                if !points.active[j] {
636                    continue;
637                }
638                assert!(points.distance_matrix[i][j] >= d_min);
639            }
640        }
641    }
642}