[][src]Crate stop_words

About

Stop words are words that don't carry much meaning, and are typically removed as a preprocessing step before text analysis or natural language processing. This crate contains common stop words for a variety of languages. All stop word lists are from this resource.

This crate currently includes the following languages:

  • Arabic
  • Bulgarian
  • Catalan
  • Czech
  • Danish
  • Dutch
  • English
  • Finnish
  • French
  • German
  • Hebrew
  • Hindi
  • Hungarian
  • Indonesian
  • Italian
  • Norwegian
  • Polish
  • Portuguese
  • Romanian
  • Russian
  • Slovak
  • Spanish
  • Swedish
  • Turkish
  • Ukrainian
  • Vietnamese

Constants

LANGUAGES

Constant containing an array of available language names, spelled out

LANGUAGES_ISO_693_1

Constant containing an array of available language names, using ISO-693-1 codes

LANGUAGES_ISO_693_2T

Constant containing an array of available language names, using ISO-693-2T codes

Functions

get

The only function you'll ever need! Given a language code or name it returns common stop words as a Vec<String>

vec_to_set

This function converts the standard Vec<String> output to a HashSet<String>