Struct linfa_preprocessing::CountVectorizer[][src]

pub struct CountVectorizer { /* fields omitted */ }
Expand description

Counts the occurrences of each vocabulary entry, learned during fitting, in a sequence of documents. Each vocabulary entry is mapped to an integer value that is used to index the count in the result.

Implementations

Construct a new set of parameters

Number of vocabulary entries learned during fitting

Given a sequence of n documents, produces a sparse array of size (n, vocabulary_entries) where column j of row i is the number of occurrences of vocabulary entry j in the document of index i. Vocabulary entry j is the string at the j-th position in the vocabulary. If a vocabulary entry was not encountered in a document, then the relative cell in the sparse matrix will be set to None.

Given a sequence of n file names, produces a sparse array of size (n, vocabulary_entries) where column j of row i is the number of occurrences of vocabulary entry j in the document contained in the file of index i. Vocabulary entry j is the string at the j-th position in the vocabulary. If a vocabulary entry was not encountered in a document, then the relative cell in the sparse matrix will be set to None.

The files will be read using the specified encoding, and any sequence unrecognized by the encoding will be handled according to trap.

Contains all vocabulary entries, in the same order used by the transform methods.

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Performs the conversion.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.