List of all items
Structs
Functions
- segmenter::split_multi
- segmenter::split_newline
- segmenter::split_single
- segmenter::to_unix_linebreaks
- tokenizer::space_tokenizer
- tokenizer::split_contractions
- tokenizer::split_possessive_markers
- tokenizer::symbol_tokenizer
- tokenizer::web_tokenizer
- tokenizer::word_tokenizer
Statics
- segmenter::ABBREVIATIONS
- segmenter::BEFORE_LOWER
- segmenter::CONTINUATIONS
- segmenter::DO_NOT_CROSS_LINES
- segmenter::LONE_WORD
- segmenter::LOWER_WORD
- segmenter::MAY_CROSS_ONE_LINE
- segmenter::MIDDLE_INITIAL_END
- segmenter::NON_UNIX_LINEBREAK
- segmenter::UPPER_CASE_END
- segmenter::UPPER_CASE_START
- segmenter::UPPER_WORD_START
- segmenter::dates::ENDS_IN_DATE_DIGITS
- segmenter::dates::MONTH
- tokenizer::APOSTROPHE_LIKE
- tokenizer::HYPHENATED_LINEBREAK
- tokenizer::IS_CONTRACTION
- tokenizer::IS_POSSESSIVE
- tokenizer::SPACES
- tokenizer::SYMBOLIC
- tokenizer::URI_OR_MAIL
- tokenizer::WORD_BITS