Expand description

Tukey’s method

The original method uses two “fences” to classify the data. All the observations “inside” the fences are considered “normal”, and the rest are considered outliers.

The fences are computed from the quartiles of the sample, according to the following formula:

// q1, q3 are the first and third quartiles
let iqr = q3 - q1;  // The interquartile range
let (f1, f2) = (q1 - 1.5 * iqr, q3 + 1.5 * iqr);  // the "fences"

let is_outlier = |x| if x > f1 && x < f2 { true } else { false };

The classifier provided here adds two extra outer fences:

let (f3, f4) = (q1 - 3 * iqr, q3 + 3 * iqr);  // the outer "fences"

The extra fences add a sense of “severity” to the classification. Data points outside of the outer fences are considered “severe” outliers, whereas points outside the inner fences are just “mild” outliers, and, as the original method, everything inside the inner fences is considered “normal” data.

Some ASCII art for the visually oriented people:

         LOW-ish                NORMAL-ish                 HIGH-ish
        x   |       +    |  o o  o    o   o o  o  |        +   |   x
            f3           f1                       f2           f4

Legend:
o: "normal" data (not an outlier)
+: "mild" outlier
x: "severe" outlier

Structs

Iterator over the labeled data
A classified/labeled sample.

Enums

Labels used to classify outliers

Functions

Classifies the sample, and returns a labeled sample.