[][src]Crate stopwords

This library provides stopwords datasets from popular text processing engines.

This could help reproducing results of text analysis pipelines written using different languages and tools.

Usage

[dependencies]
stopwords = "0.1.0"
extern crate stopwords;

use std::collections::HashSet;
use stopwords::{Spark, Language, Stopwords};

fn main() {
    let stops: HashSet<_> = Spark::stopwords(Language::English).unwrap().iter().collect();
    let mut tokens = vec!("brocolli", "is", "good", "to", "eat");
    tokens.retain(|s| !stops.contains(s));
    assert_eq!(tokens, vec!("brocolli", "good", "eat"));
}

Structs

LanguageError

Language parse error.

NLTK

Data from NLTK - Python natural language toolkit.

SkLearn

Data from scikit-learn - Python machine learning library.

Spark

Data from Apache Spark - Scala engine for large-scale data processing.

Enums

Language

Supported languages. Each provider supports only a subset of this list.

Traits

Stopwords

Interface for getting stopwords from different providers.