Struct linfa_preprocessing::CountVectorizer
source · pub struct CountVectorizer { /* private fields */ }
Expand description
Counts the occurrences of each vocabulary entry, learned during fitting, in a sequence of documents. Each vocabulary entry is mapped to an integer value that is used to index the count in the result.
Implementations§
source§impl CountVectorizer
impl CountVectorizer
sourcepub fn params() -> CountVectorizerParams
pub fn params() -> CountVectorizerParams
Construct a new set of parameters
sourcepub fn transform<T: ToString, D: Data<Elem = T>>(
&self,
x: &ArrayBase<D, Ix1>
) -> CsMat<usize>
pub fn transform<T: ToString, D: Data<Elem = T>>( &self, x: &ArrayBase<D, Ix1> ) -> CsMat<usize>
Given a sequence of n
documents, produces a sparse array of size (n, vocabulary_entries)
where column j
of row i
is the number of occurrences of vocabulary entry j
in the document of index i
. Vocabulary entry j
is the string
at the j
-th position in the vocabulary. If a vocabulary entry was not encountered in a document, then the relative
cell in the sparse matrix will be set to None
.
sourcepub fn transform_files<P: AsRef<Path>>(
&self,
input: &[P],
encoding: EncodingRef,
trap: DecoderTrap
) -> CsMat<usize>
pub fn transform_files<P: AsRef<Path>>( &self, input: &[P], encoding: EncodingRef, trap: DecoderTrap ) -> CsMat<usize>
Given a sequence of n
file names, produces a sparse array of size (n, vocabulary_entries)
where column j
of row i
is the number of occurrences of vocabulary entry j
in the document contained in the file of index i
. Vocabulary entry j
is the string
at the j
-th position in the vocabulary. If a vocabulary entry was not encountered in a document, then the relative
cell in the sparse matrix will be set to None
.
The files will be read using the specified encoding
, and any sequence unrecognized by the encoding will be handled
according to trap
.
sourcepub fn vocabulary(&self) -> &Vec<String>
pub fn vocabulary(&self) -> &Vec<String>
Contains all vocabulary entries, in the same order used by the transform
methods.
Trait Implementations§
source§impl Clone for CountVectorizer
impl Clone for CountVectorizer
source§fn clone(&self) -> CountVectorizer
fn clone(&self) -> CountVectorizer
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read more