[][src]Crate stop_words

About

Stop words are words that don't carry much meaning, and are typically removed as a preprocessing step before text analysis or natural language processing. This crate contains common stop words for a variety of languages. This crate uses stop word lists from this resource and also from NLTK.

This crate currently includes the following languages:

  • Arabic
  • Azerbaijani
  • Bulgarian
  • Catalan
  • Czech
  • Danish
  • Dutch
  • English
  • Finnish
  • French
  • German
  • Greek
  • Hebrew
  • Hindi
  • Hungarian
  • Indonesian
  • Italian
  • Kazakh
  • Nepali
  • Norwegian
  • Polish
  • Portuguese
  • Romanian
  • Russian
  • Slovak
  • Slovenian
  • Spanish
  • Swedish
  • Tajik
  • Turkish
  • Ukrainian
  • Vietnamese

Constants

LANGUAGES

Constant containing an array of available language names, spelled out

LANGUAGES_ISO_693_1

Constant containing an array of available language names, using ISO-693-1 codes

LANGUAGES_ISO_693_2T

Constant containing an array of available language names, using ISO-693-2T codes

Functions

get

The only function you'll ever need! Given a language code or name it returns common stop words as a Vec<String>

get_nltk

Ok, you might need this function too. It fetches stop words specifically for NLTK.