Crate rusev

Expand description

This library is a re-implementation of the SeqEval library. It is built with a focus on performance and soudness.

§SCHEMES

The current schemes are supported:

IOB1: Here, I is a token inside a chunk, O is a token outside a chunk and B is the beginning of the chunk immediately following another chunk of the same named entity.
IOB2: It is same as IOB1, except that a B tag is given for every token, which exists at the beginning of the chunk.
IOE1: An E tag used to mark the last token of a chunk immediately preceding another chunk of the same named entity.
IOE2: It is same as IOE1, except that an E tag is given for every token, which exists at the end of the chunk.
BILOU/IOBES: ‘E’ and ‘L’ denotes Last or Ending character in a sequence and ‘S’ denotes a single element and ‘U’ a unit element.

The BILOU and IOBES schemes are only supported in strict mode.

This library partially reuses the terminology of the SeqEval library. The concepts might not be mapped one to one.

A class is an entity we are interested in, such as ‘LOC’ for location, ‘PER’ for person, ‘GEO’ for geography, etc. It can be anything.
A token is a string containing a class, such a GEO, LOC, PER and a prefix. The prefix indicates where we are in the current chunk. For a given scheme, the list of possible prefix are the letters of the scheme, such as I-O-B or I-O-E. Prefix can only be a single ascii character.
A chunk is list of at least one token associated with a named entity.
A Scheme gives us enough information to parse a list of tokens into a chunk.

ClassMetrics
Datastructure holding metrics about a given class.
Reporter
The reporter holds the metrics of a given class and the overall metrics. It can be used to display the results (i.e. prettyprint them) as if they were collected into a dataframe and can be consumed to obtain a BTreeSet containing the metrics. The reporter can be built with the classification_report function. #Example
RusevConfig
Config struct used to simplify the imputs of parameters to the main functions of Rusev. It Implements the default trait.
RusevConfigBuilder
This builder can be used to build and customize a RusevConfig stucture.

Average
ComputationError
Enum error encompassing many type of failures that could happen when computing the precison, recall, f-score and the support.
DivByZeroStrat
How do we handle cases with a division by zero? Do we replace the denominator by 1, return an error, or replace the division result with 0? SeqEval uses by default the ReplaceBy0 strategy. It is not recommended to use the ReturnError; it will stop the computation. It can be useful if you believe there should be no 0 in the denominator.
SchemeType
Enumeration of the supported Schemes. They are use to indicate how we are supposed to parse and chunk the different tokens.

classification_report
Main entrypoint of the Rusev library. This function computes the precision, recall, fscore and support of the true and predicted tokens. It returns information about the individual classes and different overall averages. The returned structure can be used to prettyprint the results or be converted into a HashSet.
classification_report_conf
Main entrypoint of the Rusev library. This function computes the precision, recall, fscore and support of the true and predicted tokens. It returns information about the individual classes and different overall averages. The returned structure can be used to prettyprint the results or be converted into a HashSet. Instead of taking in the raw parameters, this function takes a RusevConfig struct.
precision_recall_fscore_support
One of the main entrypoints of the Rusev library. This function computes the precision, recall, fscore and support of the true and predicted tokens. This method does NOT check the lengths of y_true and y_pred.

DefaultRusevConfig
Reasonable default configuration when computation metrics.
PrecisionRecallFScoreTrueSum
Type alias for representing the output of the precision_recall_fscore_support. Each arrays contain a vector of f32. The first array contains the precision, the second the recall, the third the f-score and the last one the support.