Struct linfa_preprocessing::CountVectorizer [−][src]
pub struct CountVectorizer { /* fields omitted */ }
Expand description
Counts the occurrences of each vocabulary entry, learned during fitting, in a sequence of documents. Each vocabulary entry is mapped to an integer value that is used to index the count in the result.
Implementations
Construct a new set of parameters
Given a sequence of n
documents, produces a sparse array of size (n, vocabulary_entries)
where column j
of row i
is the number of occurrences of vocabulary entry j
in the document of index i
. Vocabulary entry j
is the string
at the j
-th position in the vocabulary. If a vocabulary entry was not encountered in a document, then the relative
cell in the sparse matrix will be set to None
.
pub fn transform_files<P: AsRef<Path>>(
&self,
input: &[P],
encoding: EncodingRef,
trap: DecoderTrap
) -> CsMat<usize>
pub fn transform_files<P: AsRef<Path>>(
&self,
input: &[P],
encoding: EncodingRef,
trap: DecoderTrap
) -> CsMat<usize>
Given a sequence of n
file names, produces a sparse array of size (n, vocabulary_entries)
where column j
of row i
is the number of occurrences of vocabulary entry j
in the document contained in the file of index i
. Vocabulary entry j
is the string
at the j
-th position in the vocabulary. If a vocabulary entry was not encountered in a document, then the relative
cell in the sparse matrix will be set to None
.
The files will be read using the specified encoding
, and any sequence unrecognized by the encoding will be handled
according to trap
.
Contains all vocabulary entries, in the same order used by the transform
methods.