Crate stopwords

Source
Expand description

This library provides stopwords datasets from popular text processing engines.

This could help reproducing results of text analysis pipelines written using different languages and tools.

§Usage

[dependencies]
stopwords = "0.1.0"
extern crate stopwords;

use std::collections::HashSet;
use stopwords::{Spark, Language, Stopwords};

fn main() {
    let stops: HashSet<_> = Spark::stopwords(Language::English).unwrap().iter().collect();
    let mut tokens = vec!("brocolli", "is", "good", "to", "eat");
    tokens.retain(|s| !stops.contains(s));
    assert_eq!(tokens, vec!("brocolli", "good", "eat"));
}

Structs§

LanguageError
Language parse error.
NLTK
Data from NLTK - Python natural language toolkit.
SkLearn
Data from scikit-learn - Python machine learning library.
Spark
Data from Apache Spark - Scala engine for large-scale data processing.

Enums§

Language
Supported languages. Each provider supports only a subset of this list.

Traits§

Stopwords
Interface for getting stopwords from different providers.