Module linfa_preprocessing::tf_idf_vectorization
source · [−]Expand description
Term frequency - inverse document frequency vectorization methods
Structs
Counts the occurrences of each vocabulary entry, learned during fitting, in a sequence of texts and scales them by the inverse document document frequency defined by the method. Each vocabulary entry is mapped to an integer value that is used to index the count in the result.
Simlar to CountVectorizer
but instead of
just counting the term frequency of each vocabulary entry in each given document,
it computes the term frequecy times the inverse document frequency, thus giving more importance
to entries that appear many times but only on some documents. The weight function can be adjusted
by setting the appropriate method. This struct provides the same string
processing customizations described in CountVectorizer
.
Enums
Methods for computing the inverse document frequency of a vocabulary entry