pub struct Dbscan;
Expand description

DBSCAN (Density-based Spatial Clustering of Applications with Noise) clusters together points which are close together with enough neighbors labelled points which are sparsely neighbored as noise. As points may be part of a cluster or noise the predict method returns Array1<Option<usize>>

As it groups together points in dense regions the number of clusters is determined by the dataset and distance tolerance not the user.

We provide an implemention of the standard O(N^2) query-based algorithm of which more details can be found in the next section or here.

The standard DBSCAN algorithm isn’t iterative and therefore there’s no fit method provided only predict.

The algorithm

The algorithm iterates over each point in the dataset and for every point not yet assigned to a cluster:

  • Find all points within the neighborhood of size tolerance
  • If the number of points in the neighborhood is below a minimum size label as noise
  • Otherwise label the point with the cluster ID and repeat with each of the neighbours

Tutorial

Let’s do a walkthrough of an example running DBSCAN on some data.

use linfa::traits::*;
use linfa_clustering::{DbscanParams, Dbscan};
use linfa_datasets::generate;
use ndarray::{Axis, array, s};
use ndarray_rand::rand::SeedableRng;
use rand_xoshiro::Xoshiro256Plus;
use approx::assert_abs_diff_eq;

// Our random number generator, seeded for reproducibility
let seed = 42;
let mut rng = Xoshiro256Plus::seed_from_u64(seed);

// `expected_centroids` has shape `(n_centroids, n_features)`
// i.e. three points in the 2-dimensional plane
let expected_centroids = array![[0., 1.], [-10., 20.], [-1., 10.]];
// Let's generate a synthetic dataset: three blobs of observations
// (100 points each) centered around our `expected_centroids`
let observations = generate::blobs(100, &expected_centroids, &mut rng);

// Let's configure and run our DBSCAN algorithm
// We use the builder pattern to specify the hyperparameters
// `min_points` is the only mandatory parameter.
// If you don't specify the others (e.g. `tolerance`)
// default values will be used.
let min_points = 3;
let clusters = Dbscan::params(min_points)
    .tolerance(1e-2)
    .transform(&observations)
    .unwrap();
// Points are `None` if noise `Some(id)` if belonging to a cluster.

Implementations§

source§

impl Dbscan

source

pub fn params<F: Float>( min_points: usize ) -> DbscanParams<F, L2Dist, CommonNearestNeighbour>

Configures the hyperparameters with the minimum number of points required to form a cluster

Defaults are provided if the optional parameters are not specified:

  • tolerance = 1e-4
  • dist_fn = L2Dist (Euclidean distance)
  • nn_algo = KdTree
source

pub fn params_with<F: Float, D: Distance<F>, N: NearestNeighbour>( min_points: usize, dist_fn: D, nn_algo: N ) -> DbscanParams<F, D, N>

Configures the hyperparameters with the minimum number of points, a custom distance metric, and a custom nearest neighbour algorithm

Trait Implementations§

source§

impl Clone for Dbscan

source§

fn clone(&self) -> Dbscan

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for Dbscan

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl PartialEq for Dbscan

source§

fn eq(&self, other: &Dbscan) -> bool

This method tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

This method tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
source§

impl Eq for Dbscan

source§

impl StructuralEq for Dbscan

source§

impl StructuralPartialEq for Dbscan

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for Twhere T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for Twhere T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for Twhere T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<Q, K> Equivalent<K> for Qwhere Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

source§

fn equivalent(&self, key: &K) -> bool

Compare self to key and return true if they are equal.
§

impl<Q, K> Equivalent<K> for Qwhere Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

§

fn equivalent(&self, key: &K) -> bool

Checks if this value is equivalent to the given key. Read more
§

impl<Q, K> Equivalent<K> for Qwhere Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

§

fn equivalent(&self, key: &K) -> bool

Compare self to key and return true if they are equal.
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for Twhere U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

§

impl<T> Pointable for T

§

const ALIGN: usize = _

The alignment of pointer.
§

type Init = T

The type for initializers.
§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
source§

impl<T> ToOwned for Twhere T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for Twhere V: MultiLane<T>,

§

fn vzip(self) -> V