[][src]Crate tokenizations

Tokenizations alignment functions.

Functions

get_alignments

Returns the tokenizations alignments a2b (from a to b) and b2a (from b to a) based on the shortest edit script (SES).

get_charmap

Returns the character mappings c_a2b (from a to b) and c_b2a (from b to a) based on the shortest edit script (SES).

get_original_spans

Returns the span indices in original_text from the tokens based on the shortest edit script (SES). This is useful, for example, when a processed result is mapped to the original text that is not normalized yet.

Type Definitions

Alignment
CharMap