Module linfa_preprocessing::tf_idf_vectorization

Expand description

Term frequency - inverse document frequency vectorization methods

Structs

Counts the occurrences of each vocabulary entry, learned during fitting, in a sequence of texts and scales them by the inverse document document frequency defined by the method. Each vocabulary entry is mapped to an integer value that is used to index the count in the result.

TfIdfVectorizer

Simlar to CountVectorizer but instead of just counting the term frequency of each vocabulary entry in each given document, it computes the term frequecy times the inverse document frequency, thus giving more importance to entries that appear many times but only on some documents. The weight function can be adjusted by setting the appropriate method. This struct provides the same string
processing customizations described in CountVectorizer.

Enums

TfIdfMethod

Methods for computing the inverse document frequency of a vocabulary entry