Skip to main content

Module count_vectorizer

Module count_vectorizer 

Source
Expand description

Count vectorizer: convert text documents to a term-count matrix.

Tokenizes documents by splitting on non-alphanumeric characters, builds a vocabulary, and produces a term-count matrix of shape (n_docs, n_vocab).

Structsยง

CountVectorizer
An unfitted count vectorizer.
FittedCountVectorizer
A fitted count vectorizer holding the learned vocabulary.