Expand description
This library is a re-implementation of the SeqEval library. It is built with a focus on performance and soudness.
§SCHEMES
The current SchemeType are supported:
- IOB1: Here,
I
is a token inside a chunk,O
is a token outside a chunk andB
is the beginning of the chunk immediately following another chunk of the same named entity. - IOB2: It is same as IOB1, except that a
B
tag is given for every token, which exists at the beginning of the chunk. - IOE1: An
E
tag used to mark the last token of a chunk immediately preceding another chunk of the same named entity. - IOE2: It is same as IOE1, except that an
E
tag is given for every token, which exists at the end of the chunk. - BILOU/IOBES: ‘E’ and ‘L’ denotes
Last
orEnding
character in a sequence and ‘S’ denotes a single element and ‘U’ a unit element.
The BILOU and IOBES schemes are only supported in strict mode.
§More information about schemes
§Terminology
This library partially reuses the terminology of the SeqEval library. The concepts might not be mapped one to one.
- A class is an entity we are interested in, such as ‘LOC’ for location, ‘PER’ for person, ‘GEO’ for geography, etc. It can be anything, but must be represented by a string.
- A token is a string containing a class, such a
GEO
,LOC
,PER
and a prefix. The prefix indicates where we are in the current chunk. For a given scheme, the list of possible prefix are the letters of the scheme, such as I-O-B or I-O-E. Prefix are limited to the lettersO
,I
,B
,E
,U
andL
. It is essential that the tokens use these prefix. - A chunk is list of at least one token associated with a named entity. A chunk could be
["B-PER", "I-PER", "I-PER"]
for example. - A Scheme gives us enough information to parse a list of tokens into a chunk. The
SchemeType
can be used to autodetect theScheme
used in a given list of sequences.
§Example
Here is a simple example showing how to use this library by using the config
API (eg. classification_report_conf):
use rusev::{SchemeType, RusevConfigBuilder, DefaultRusevConfig, classification_report_conf};
let y_true = vec![vec!["B-TEST", "B-NOTEST", "O", "B-TEST"]];
let y_pred = vec![vec!["O", "B-NOTEST", "B-OTHER", "B-TEST"]];
let config: DefaultRusevConfig =
RusevConfigBuilder::default().scheme(SchemeType::IOB2).strict(true).build();
let wrapped_reporter = classification_report_conf(y_true, y_pred, config);
let reporter = wrapped_reporter.unwrap();
let expected_report = "Class, Precision, Recall, Fscore, Support
Overall_Weighted, 1, 0.6666667, 0.77777785, 3
Overall_Micro, 0.6666667, 0.6666667, 0.6666667, 3
Overall_Macro, 0.6666667, 0.5, 0.5555556, 3
NOTEST, 1, 1, 1, 1
OTHER, 0, 0, 0, 0
TEST, 1, 0.5, 0.6666667, 2\n";
assert_eq!(expected_report, reporter.to_string());
It is also possible to use the classification_report
(classification_report) function and to
specify each parameters manually.
Rusev also exposes a function to compute the precision, recall, F-score and support for a given
Average and a beta
for the F-score calculations with the function
precision_recall_fscore_support
Structs§
- Class
Metrics - Datastructure holding metrics about a given class.
- Reporter
- The reporter holds the metrics of a given class and the overall metrics. It can be used to
display the results (i.e. prettyprint them) as if they were collected into a dataframe and can
be consumed to obtain a
BTreeSet
containing the metrics. The reporter can be built with theclassification_report
function. - Rusev
Config - Config struct used to simplify the imputs of parameters to the main functions of
Rusev
. It Implements theDefault
trait. - Rusev
Config Builder - This builder can be used to build and customize a
RusevConfig
stucture.
Enums§
- Average
- Enumeration of the different types of averaging possible and supported by this crate. &str can
be parsed to create an
Average
. - Computation
Error - Enum error encompassing many type of failures that could happen when computing the precison, recall, f-score and the support.
- DivBy
Zero Strat - How do we handle cases with a division by zero? Do we replace the denominator by 1, return an
error, or replace the division result with 0? SeqEval uses by default the
ReplaceBy0
strategy. It is not recommended to use the ReturnError; it will stop the computation. It can be useful if you believe there should be no 0 in the denominator. - Scheme
Type - Enumeration of the supported Schemes. They are used to indicate how we are supposed to parse and chunk the different tokens.
Functions§
- classification_
report - One of the main entrypoints of the Rusev library. This function computes the precision, recall, fscore and support of the true and predicted tokens. It returns information about the individual classes and different overall averages. The returned structure can be used to prettyprint the results or be converted into a HashSet.
- classification_
report_ conf - One of the main entrypoints of the Rusev library. This function computes the precision, recall,
fscore and support of the true and predicted tokens. It returns information about the
individual classes and different overall averages. The returned structure can be used to
prettyprint the results or be converted into a HashSet. Instead of taking in the raw
parameters, this function takes a
RusevConfig
struct and uses sensible defaults. - precision_
recall_ fscore_ support - One of the main entrypoints of the Rusev library. This function computes the precision, recall,
fscore and support of the true and predicted tokens. This method does NOT check the lengths of
y_true
andy_pred
.
Type Aliases§
- Default
Rusev Config - Reasonable default configuration when computing metrics.
- Precision
RecallF Score True Sum - Type alias for representing the output of the
precision_recall_fscore_support
. Each index contain a one dimension array off32
s. The first array contains the precision, the second the recall, the third the f-score and the last one the support.