Expand description
This library provides stopwords datasets from popular text processing engines.
This could help reproducing results of text analysis pipelines written using different languages and tools.
§Usage
[dependencies]
stopwords = "0.1.0"
extern crate stopwords;
use std::collections::HashSet;
use stopwords::{Spark, Language, Stopwords};
fn main() {
let stops: HashSet<_> = Spark::stopwords(Language::English).unwrap().iter().collect();
let mut tokens = vec!("brocolli", "is", "good", "to", "eat");
tokens.retain(|s| !stops.contains(s));
assert_eq!(tokens, vec!("brocolli", "good", "eat"));
}
Structs§
- Language
Error - Language parse error.
- NLTK
- Data from NLTK - Python natural language toolkit.
- SkLearn
- Data from scikit-learn - Python machine learning library.
- Spark
- Data from Apache Spark - Scala engine for large-scale data processing.
Enums§
- Language
- Supported languages. Each provider supports only a subset of this list.
Traits§
- Stopwords
- Interface for getting stopwords from different providers.