[−][src]Struct clusterphobia::clustering::bcubed::BCubed
The B-Cubed extrinsic measure of the similarity of two Clusterings.
A similarity of one means perfect concordance between clusters and gold-standard truth set categories. The closer the similarity gets to zero, the worse the concordance.
The B-Cubed measure was proposed in this paper:
[1] A. Bagga and B. Baldwin. Entity-based cross-document coreferencing using the vector space model. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics - Volume 1, ACL ’98, pages 79–85, 1998.
There are many measures of clustering accuracy, some better than others. The following paper compared many of them and found B-Cubed the best according to four formal constraints:
- Cluster Homogeneity
- Cluster Completeness
- Rag Bag
- Cluster Size vs quantity
[2] A comparison of Extrinsic Clustering Evaluation Metrics based on Formal Constraints by Enrique Amigo, Julio Gonzalo, Javier Artiles, Felisa Verdejo of the Departamento de Lenguajes y Sistemas Informaticos UNED, Madrid, Spain, May 11, 2009
A subsequent paper identified a use case where B-Cubed fared poorly: unbalanced datasets where one cluster dominates:
[3] Adapted B-CUBED Metrics to Unbalanced Datasets by Jose G. Moreno and Gaël Dias, both of Normandie University in France.
This third paper proposed a refined version of B-Cubed, but the added complexity adds significantly to processing time, so those refinements are not employed here. The definition of the algorithm used here is taken from section 2.1 of this last paper. (The refined version is in section 2.2.)
// 𝔽 = F-measure (final similarity measure) // ℙ = Precision (a measure of homogeneity) // ℝ = Recall (a measure of completeness) // α = Weighting factor (defaults to 0.5) // ℕ = Number of points // k = Number of categories (varies between the π and π* Clusterings) // i = category index // πᵢ = cluster solution for the ith category // π*ᵢ= gold standard for the ith category // g₀ = tests whether two items share the same category in the clustering // g*₀= tests whether two items share the same category in the gold standard // // 𝟙 α 𝟙 - α // ━━━━━ ═ ━━━━━ + ━━━━━ // 𝔽 ℙ ℝ // b³ b³ b³ // // k // ℙ 𝟙 ⎲ 𝟙 ⎲ ⎲ // b³ ═ ━━━ ⎳ ━━━━━ ⎳ ⎳ g*₀(xⱼ,xₗ) // ℕ i=1 |πᵢ| xⱼ∈πᵢ xₗ∈πᵢ // // k // ℝ 𝟙 ⎲ 𝟙 ⎲ ⎲ // b³ ═ ━━━ ⎳ ━━━━━ ⎳ ⎳ g₀(xⱼ,xₗ) // ℕ i=1 |π*ᵢ| xⱼ∈π*ᵢ xₗ∈π*ᵢ // // ( 𝟙 ⟺ ∃l:xᵢ∈πₗ ∧ xⱼ∈πₗ // g₀(xᵢ,xⱼ) ═ < // ( 𝟘, otherwise // // // ( 𝟙 ⟺ ∃l:xᵢ∈π*ₗ ∧ xⱼ∈π*ₗ // g*₀(xᵢ,xⱼ) ═ < // ( 𝟘, otherwise
Methods
impl BCubed
[src]
pub fn new(precision: f64, recall: f64, alpha: f64) -> Self
[src]
Create a BCubed value, knowing all its components.
pub fn get_precision(&self) -> f64
[src]
Get the precision, a measure of homogeneity from zero to one.
pub fn get_recall(&self) -> f64
[src]
Get the recall, a measure of completeness from zero to one.
pub fn get_alpha(&self) -> f64
[src]
Get alpha, the weighting factor that ranges between zero and one and can shift between favoring Precision or Recall in the similarity calculation.
pub fn similarity(&self) -> f64
[src]
The F-measure (a harmonic average) applied to precision and recall, a unified measure of the quality of the clustering.
pub fn compare<C: Chopped, M: Chopped, G: Iterator<Item = C>>(
solution: &Clustering<C, M, G>,
gold_standard: &Clustering<C, M, G>,
alpha: f64
) -> Self
[src]
solution: &Clustering<C, M, G>,
gold_standard: &Clustering<C, M, G>,
alpha: f64
) -> Self
Compare two Clusterings and compute the BCubed value.
- solution - The
Clustering
whose quality is to be assessed. - gold_standard - The perfect
Clustering
whose categories are all properly assigned. - alpha - A value between zero and one. used to weight
precision
andrecall
.- If
alpha
is 0.5,precision
andrecall
are weighted equally. - If
alpha
is zero, onlyrecall
is used. - If
alpha
is one, onlyprecision
is used.
- If
Trait Implementations
impl Clone for BCubed
[src]
impl Copy for BCubed
[src]
impl PartialEq<BCubed> for BCubed
[src]
impl Debug for BCubed
[src]
impl StructuralPartialEq for BCubed
[src]
Auto Trait Implementations
impl Send for BCubed
impl Sync for BCubed
impl Unpin for BCubed
impl UnwindSafe for BCubed
impl RefUnwindSafe for BCubed
Blanket Implementations
impl<T, U> Into<U> for T where
U: From<T>,
[src]
U: From<T>,
impl<T> From<T> for T
[src]
impl<T> ToOwned for T where
T: Clone,
[src]
T: Clone,
type Owned = T
The resulting type after obtaining ownership.
fn to_owned(&self) -> T
[src]
fn clone_into(&self, target: &mut T)
[src]
impl<T, U> TryFrom<U> for T where
U: Into<T>,
[src]
U: Into<T>,
type Error = Infallible
The type returned in the event of a conversion error.
fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>
[src]
impl<T, U> TryInto<U> for T where
U: TryFrom<T>,
[src]
U: TryFrom<T>,
type Error = <U as TryFrom<T>>::Error
The type returned in the event of a conversion error.
fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>
[src]
impl<T> Borrow<T> for T where
T: ?Sized,
[src]
T: ?Sized,
impl<T> BorrowMut<T> for T where
T: ?Sized,
[src]
T: ?Sized,
fn borrow_mut(&mut self) -> &mut T
[src]
impl<T> Any for T where
T: 'static + ?Sized,
[src]
T: 'static + ?Sized,
impl<V, T> VZip<V> for T where
V: MultiLane<T>,
V: MultiLane<T>,