Crate stop_words[][src]

Expand description

Github CI Crates.io docs.rs

About

Stop words are words that don’t carry much meaning, and are typically removed as a preprocessing step before text analysis or natural language processing. This crate contains common stop words for a variety of languages. This crate uses stop word lists from Stopwords ISO and also from NLTK.

Usage

Using this crate is fairly straight-forward:

// Get the stop words
let words = stop_words::get(stop_words::LANGUAGE::English);

// Print them
for word in words {
    println!("{}", word)
}

The function get will take either a member of the LANGUAGE enum or a two-letter ISO language code as either a str or a String type.

You can find a complete example of how to read in a text file and remove stop words here.

Language Availability

This crate supports all languages from Stopwords ISO and also from NLTK. Expand the table below to see a comprehensive description.

Language Coverage Table
ISO 639-1 CodeLanguageStopwords ISONLTK
aaAfar
abAbkhazian
afAfrikaans
akAkan
sqAlbanian
amAmharic
arArabic
anAragonese
hyArmenian
asAssamese
avAvaric
aeAvestan
ayAymara
azAzerbaijani
baBashkir
bmBambara
euBasque
beBelarusian
bnBengali
bhBihari languages
biBislama
boTibetan
bsBosnian
brBreton
bgBulgarian
myBurmese
caCatalan; Valencian
csCzech
chChamorro
ceChechen
zhChinese
cuChurch Slavic; Old Slavonic; Church Slavonic; Old Bulgarian; Old Church Slavonic
cvChuvash
kwCornish
coCorsican
crCree
cyWelsh
daDanish
deGerman
dvDivehi; Dhivehi; Maldivian
nlDutch; Flemish
dzDzongkha
elGreek, Modern (1453-)
enEnglish
eoEsperanto
etEstonian
eeEwe
foFaroese
faPersian
fjFijian
fiFinnish
frFrench
fyWestern Frisian
ffFulah
kaGeorgian
gdGaelic; Scottish Gaelic
gaIrish
glGalician
gvManx
gnGuarani
guGujarati
htHaitian; Haitian Creole
haHausa
heHebrew
hzHerero
hiHindi
hoHiri Motu
hrCroatian
huHungarian
igIgbo
isIcelandic
ioIdo
iiSichuan Yi; Nuosu
iuInuktitut
ieInterlingue; Occidental
iaInterlingua (International Auxiliary Language Association)
idIndonesian
ikInupiaq
itItalian
jvJavanese
jaJapanese
klKalaallisut; Greenlandic
knKannada
ksKashmiri
krKanuri
kkKazakh
kmCentral Khmer
kiKikuyu; Gikuyu
rwKinyarwanda
kyKirghiz; Kyrgyz
kvKomi
kgKongo
koKorean
kjKuanyama; Kwanyama
kuKurdish
loLao
laLatin
lvLatvian
liLimburgan; Limburger; Limburgish
lnLingala
ltLithuanian
lbLuxembourgish; Letzeburgesch
luLuba-Katanga
lgGanda
mkMacedonian
mhMarshallese
mlMalayalam
miMaori
mrMarathi
msMalay
mgMalagasy
mtMaltese
mnMongolian
naNauru
nvNavajo; Navaho
nrNdebele, South; South Ndebele
ndNdebele, North; North Ndebele
ngNdonga
neNepali
nnNorwegian Nynorsk; Nynorsk, Norwegian
nbBokmål, Norwegian; Norwegian Bokmål
noNorwegian
nyChichewa; Chewa; Nyanja
ocOccitan (post 1500)
ojOjibwa
orOriya
omOromo
osOssetian; Ossetic
paPanjabi; Punjabi
piPali
plPolish
ptPortuguese
psPushto; Pashto
quQuechua
rmRomansh
roRomanian; Moldavian; Moldovan
rnRundi
ruRussian
sgSango
saSanskrit
siSinhala; Sinhalese
skSlovak
slSlovenian
seNorthern Sami
smSamoan
snShona
sdSindhi
soSomali
stSotho, Southern
esSpanish; Castilian
scSardinian
srSerbian
ssSwati
suSundanese
swSwahili
svSwedish
tyTahitian
taTamil
ttTatar
teTelugu
tgTajik
tlTagalog
thThai
tiTigrinya
toTonga (Tonga Islands)
tnTswana
tsTsonga
tkTurkmen
trTurkish
twTwi
ugUighur; Uyghur
ukUkrainian
urUrdu
uzUzbek
veVenda
viVietnamese
voVolapük
waWalloon
woWolof
xhXhosa
yiYiddish
yoYoruba
zaZhuang; Chuang
zuZulu

Enums

Enum containing available language names

Functions

This function is the only one you’ll ever need! It fetches stop words for a language using either a member of the LANGUAGE enum, or a two-character ISO language name as either a str or a String type.