Crate exact_clustering

Crate exact_clustering 

Source
Expand description

Find optimal clusterings and hierarchical clusterings on up to 32 points. If you only need approximate clusterings, there are excellent other crates that run significantly faster.

For weighted and unweighted kmedian-clustering (where centers are chosen from the points themselves), see KMedian. For kmeans-clustering (where centers are chosen from the ambient space), see KMeans and WeightedKMeans. If you’d like to solve other clustering-problems, implement the Cost-trait (and feel free to submit a pull-request!), or submit an issue on GitHub.

Among others, the Cost-trait allows you to calculate:

§Example

use ndarray::prelude::*;
use std::collections::BTreeSet;
use exact_clustering::{Cost as _, KMedian, KMeans};

// Set of 2d-points looking like ⠥
let points = vec![
    array![0.0, 0.0],
    array![1.0, 0.0],
    array![0.0, 2.0],
];
// Instances are mutable to allow caching cluster-costs
let mut kmedian = KMedian::l1(&points).unwrap();
// All optimal clusterings are calculated at once to permit some speedups.
let (cost, clusters) = &kmedian.optimal_clusterings()[2];

assert_eq!(*cost, 1.0);
// Each cluster in the returned clustering is a set of point-indices:
assert_eq!(
    BTreeSet::from([BTreeSet::from([0, 1]), BTreeSet::from([2])]),
    clusters
        .iter()
        .cloned()
        .map(BTreeSet::from_iter)
        .collect(),
);

let price_of_hierarchy = kmedian.price_of_hierarchy().0;
assert_eq!(price_of_hierarchy, 1.0);

let price_of_greedy = kmedian.price_of_greedy().0;
assert_eq!(price_of_greedy, 1.0);

Structs§

Cluster
A compact representation of a cluster of points, using a bitset.
ClusterIter
An iterator over the point-indices in a cluster.
KMeans
A clustering-problem where each center can be any point in the metric space.
KMedian
A clustering-problem where each center must be one of the points that are to be clustered.
WeightedKMeans
A weighted clustering-problem where each center can be any point in the metric space.

Enums§

Error
An error-type for creating clustering-problems.

Constants§

MAX_POINT_COUNT
The maximum number of points we can cluster before we overflow Storage.

Traits§

Cost
A trait for cost-functions for a class of clustering-problems.

Functions§

cluster_from_iterator
Construct a cluster from an iterator of point-indices.

Type Aliases§

Clustering
A partition of a set of points into disjoint clusters.
Point
A single point.
Storage
The storage-medium for representing clusters compactly via a bitset.
WeightedPoint
A weighted point.