[][src]Struct clusterphobia::clustering::bcubed::BCubed

pub struct BCubed { /* fields omitted */ }

The B-Cubed extrinsic measure of the similarity of two Clusterings.

A similarity of one means perfect concordance between clusters and gold-standard truth set categories. The closer the similarity gets to zero, the worse the concordance.

The B-Cubed measure was proposed in this paper:

[1] A. Bagga and B. Baldwin. Entity-based cross-document coreferencing using the vector space model. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics - Volume 1, ACL ’98, pages 79–85, 1998.

There are many measures of clustering accuracy, some better than others. The following paper compared many of them and found B-Cubed the best according to four formal constraints:

  1. Cluster Homogeneity
  2. Cluster Completeness
  3. Rag Bag
  4. Cluster Size vs quantity

[2] A comparison of Extrinsic Clustering Evaluation Metrics based on Formal Constraints by Enrique Amigo, Julio Gonzalo, Javier Artiles, Felisa Verdejo of the Departamento de Lenguajes y Sistemas Informaticos UNED, Madrid, Spain, May 11, 2009

A subsequent paper identified a use case where B-Cubed fared poorly: unbalanced datasets where one cluster dominates:

[3] Adapted B-CUBED Metrics to Unbalanced Datasets by Jose G. Moreno and Gaël Dias, both of Normandie University in France.

This third paper proposed a refined version of B-Cubed, but the added complexity adds significantly to processing time, so those refinements are not employed here. The definition of the algorithm used here is taken from section 2.1 of this last paper. (The refined version is in section 2.2.)

 //   𝔽 = F-measure (final similarity measure)
 //   ℙ = Precision (a measure of homogeneity)
 //   ℝ = Recall (a measure of completeness)
 //   α = Weighting factor (defaults to 0.5)
 //   ℕ = Number of points
 //   k = Number of categories (varies between the π and π* Clusterings)
 //   i = category index
 //  πᵢ = cluster solution for the ith category
 //  π*ᵢ= gold standard for the ith category
 //  g₀ = tests whether two items share the same category in the clustering
 //  g*₀= tests whether two items share the same category in the gold standard
 //
 //      𝟙       α     𝟙 - α
 //    ━━━━━ ═ ━━━━━ + ━━━━━
 //     𝔽       ℙ       ℝ
 //      b³      b³      b³
 //
 //                     k
 //     ℙ         𝟙    ⎲     𝟙     ⎲   ⎲    
 //      b³  ═   ━━━   ⎳   ━━━━━   ⎳   ⎳   g*₀(xⱼ,xₗ)
 //               ℕ    i=1   |πᵢ|   xⱼ∈πᵢ xₗ∈πᵢ
 //
 //                     k
 //     ℝ         𝟙    ⎲     𝟙     ⎲    ⎲    
 //      b³  ═   ━━━   ⎳   ━━━━━   ⎳    ⎳   g₀(xⱼ,xₗ)
 //               ℕ    i=1   |π*ᵢ|  xⱼ∈π*ᵢ xₗ∈π*ᵢ
 //
 //              (  𝟙 ⟺ ∃l:xᵢ∈πₗ ∧ xⱼ∈πₗ
 // g₀(xᵢ,xⱼ)  ═ <
 //              (  𝟘, otherwise
 //
 //
 //              (  𝟙 ⟺ ∃l:xᵢ∈π*ₗ ∧ xⱼ∈π*ₗ
 // g*₀(xᵢ,xⱼ) ═ <
 //              (  𝟘, otherwise
 
 

Methods

impl BCubed[src]

pub fn new(precision: f64, recall: f64, alpha: f64) -> Self[src]

Create a BCubed value, knowing all its components.

pub fn get_precision(&self) -> f64[src]

Get the precision, a measure of homogeneity from zero to one.

pub fn get_recall(&self) -> f64[src]

Get the recall, a measure of completeness from zero to one.

pub fn get_alpha(&self) -> f64[src]

Get alpha, the weighting factor that ranges between zero and one and can shift between favoring Precision or Recall in the similarity calculation.

pub fn similarity(&self) -> f64[src]

The F-measure (a harmonic average) applied to precision and recall, a unified measure of the quality of the clustering.

pub fn compare<C: Chopped, M: Chopped, G: Iterator<Item = C>>(
    solution: &Clustering<C, M, G>,
    gold_standard: &Clustering<C, M, G>,
    alpha: f64
) -> Self
[src]

Compare two Clusterings and compute the BCubed value.

  • solution - The Clustering whose quality is to be assessed.
  • gold_standard - The perfect Clustering whose categories are all properly assigned.
  • alpha - A value between zero and one. used to weight precision and recall.
    • If alpha is 0.5, precision and recall are weighted equally.
    • If alpha is zero, only recall is used.
    • If alpha is one, only precision is used.

Trait Implementations

impl Clone for BCubed[src]

impl Copy for BCubed[src]

impl PartialEq<BCubed> for BCubed[src]

impl Debug for BCubed[src]

impl StructuralPartialEq for BCubed[src]

Auto Trait Implementations

impl Send for BCubed

impl Sync for BCubed

impl Unpin for BCubed

impl UnwindSafe for BCubed

impl RefUnwindSafe for BCubed

Blanket Implementations

impl<T, U> Into<U> for T where
    U: From<T>, 
[src]

impl<T> From<T> for T[src]

impl<T> ToOwned for T where
    T: Clone
[src]

type Owned = T

The resulting type after obtaining ownership.

impl<T, U> TryFrom<U> for T where
    U: Into<T>, 
[src]

type Error = Infallible

The type returned in the event of a conversion error.

impl<T, U> TryInto<U> for T where
    U: TryFrom<T>, 
[src]

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

impl<T> Borrow<T> for T where
    T: ?Sized
[src]

impl<T> BorrowMut<T> for T where
    T: ?Sized
[src]

impl<T> Any for T where
    T: 'static + ?Sized
[src]

impl<V, T> VZip<V> for T where
    V: MultiLane<T>,