Expand description
Text processing and feature extraction for NLP tasks.
Provides tokenization, count-based vectorization, and TF-IDF weighting.
All vectorizers produce sparse CSR matrices via crate::sparse::CsrMatrix.
§Example
ⓘ
use scry_learn::text::{CountVectorizer, TfidfVectorizer};
let docs = ["the cat sat", "the dog sat", "the cat played"];
// Count vectorizer
let mut cv = CountVectorizer::new();
let counts = cv.fit_transform(&docs);
// TF-IDF vectorizer
let mut tfidf = TfidfVectorizer::new();
let matrix = tfidf.fit_transform(&docs);Re-exports§
pub use count::CountVectorizer;pub use tfidf::TfidfNorm;pub use tfidf::TfidfVectorizer;
Modules§
Functions§
- sparse_
to_ dataset - Convert a sparse CSR matrix (from a text vectorizer) into a
Dataset.