[−][src]Crate stop_words
About
Stop words are words that don't carry much meaning, and are typically removed as a preprocessing step before text analysis or natural language processing. This crate contains common stop words for a variety of languages. This crate uses stop word lists from this resource and also from NLTK.
This crate currently includes the following languages:
- Arabic
- Azerbaijani
- Bulgarian
- Catalan
- Czech
- Danish
- Dutch
- English
- Finnish
- French
- German
- Greek
- Hebrew
- Hindi
- Hungarian
- Indonesian
- Italian
- Kazakh
- Nepali
- Norwegian
- Polish
- Portuguese
- Romanian
- Russian
- Slovak
- Slovenian
- Spanish
- Swedish
- Tajik
- Turkish
- Ukrainian
- Vietnamese
Constants
LANGUAGES | Constant containing an array of available language names, spelled out |
LANGUAGES_ISO_693_1 | Constant containing an array of available language names, using ISO-693-1 codes |
LANGUAGES_ISO_693_2T | Constant containing an array of available language names, using ISO-693-2T codes |
Functions
get | The only function you'll ever need! Given a language code or name it returns common stop words as a |
get_nltk | Ok, you might need this function too. It fetches stop words specifically for NLTK. |