List of all items
Structs
- ScrubConfig
- flash::FlashText
- flash::KeywordMatch
- subword::BpeTokenizer
- tokenize::Token
- tokenize::TokenRef
Enums
Traits
Functions
- fold::fold
- fold::strip_diacritics
- html::decode_entities
- ngram::char_ngrams
- ngram::token_ngrams
- ngram::word_ngrams
- scrub
- scrub_with
- similarity::char_ngram_jaccard
- similarity::trigram_jaccard
- similarity::weighted_word_char_ngram_jaccard
- similarity::word_jaccard
- spans::clean_span_boundary
- spans::clean_span_head
- spans::clean_span_tail
- stopwords::get
- stopwords::is_english_stopword
- tokenize::sentences
- tokenize::tokenize_refs_with_offsets
- tokenize::tokenize_with_offsets
- tokenize::words
- unicode::bidi_controls_with_offsets
- unicode::collapse_whitespace
- unicode::collapse_whitespace_into
- unicode::contains_bidi_controls
- unicode::contains_zero_width
- unicode::nfc
- unicode::nfd
- unicode::nfkc
- unicode::nfkd
- unicode::normalize_newlines
- unicode::normalize_newlines_into
- unicode::remove_bidi_controls
- unicode::remove_bidi_controls_into
- unicode::remove_zero_width
- unicode::remove_zero_width_into
- unicode::trim_lines_preserve_spaces
- unicode::zero_width_with_offsets