1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
// This Source Code Form is subject to the terms of the Mozilla Public
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at https://mozilla.org/MPL/2.0/.
//! The 1R (Holt, 1993) rule learning algorithm.
//!
//! # 1R is a baseline rule learning algorithm
//!
//! The algorithm generates a rule for each attribute in a dataset,
//! and then picks the "one rule" that has the best accuracy.
//!
//! Each rule (hypothesis) is a set of cases:
//! for every value of the attribute,
//! the prediction (the `then` part) is the most frequent class for examples with that attribute value.
//!
//! This is a baseline learner for use in comparison against more sophisticated algorithms.
//! A related idea is "0R" (zero rule), which is the most frequent class in the dataset.
//!
//! # Examples
//!
//! This crate uses [ndarray](https://docs.rs/ndarray/0.13.0/ndarray/) to represent attributes and classes.
//!
//! ```
//! use ndarray::prelude::*;
//! use oner_induction::{Rule, Case, Accuracy, discover};
//!
//! let examples = array![
//! ["sunny", "summer"],
//! ["sunny", "summer"],
//! ["cloudy", "winter"],
//! ["sunny", "winter"]
//! ];
//!
//! let classes = array![
//! "hot",
//! "hot",
//! "cold",
//! "cold"
//! ];
//!
//! // Discover the best rule, and the column it applies to:
//! let rule: Option<(usize, Rule<&str, &str>)> =
//! discover(&examples.view(), &classes.view());
//!
//! // Expected accuracy is 100%
//! let accuracy = Accuracy(1.0);
//!
//! // The "rule" is a set of cases (conditions, or "IF...THENs"):
//! let cases = vec![
//! Case { attribute_value: "summer", predicted_class: "hot" },
//! Case { attribute_value: "winter", predicted_class: "cold" }
//! ];
//!
//! // Column 1 is the Season (winter or summer)
//! assert_eq!(rule, Some( (1, Rule { cases, accuracy }) ));
//! ```
//!
//! # References
//!
//! - Holte, R.C. (1993) Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. _Machine Learning_ 11: 63. [https://doi.org/10.1023/A:1022631118932](https://doi.org/10.1023/A:1022631118932).
//! - Molnar, C, (2019) _Interpretable Machine Learning_. In particular: [Learn Rules from a Single Feature (OneR)](https://christophm.github.io/interpretable-ml-book/rules.html#learn-rules-from-a-single-feature-oner).
//!
//! # Terminology
//!
//! I'm following the terminology from Holte (1993):
//!
//! - Attribute (a.k.a. feature)
//! - Value (the value of an attribute or class)
//! - Class (classification, prediction)
//! - Example (instance)
//!
//! In generic parameters, `A` is for attribute and `C` is for class.
//!
//! # Limitations
//!
//! This crate assumes numeric data has already been converted to categorical data.
//!
//! See <https://docs.rs/oner_quantize> for an implementation of the 1R qualitzation algorithm.
//!
use Hash;
/// A prediction based on an attribute value.
/// The rule for an attribute, together with the training data accuracy.
/// Fraction of correct predictions out of all rows in the training data.
;
pub use discover;
pub use ;