pub struct CountVectorizer { /* private fields */ }
Expand description

Counts the occurrences of each vocabulary entry, learned during fitting, in a sequence of documents. Each vocabulary entry is mapped to an integer value that is used to index the count in the result.

Implementations§

source§

impl CountVectorizer

source

pub fn params() -> CountVectorizerParams

Construct a new set of parameters

source

pub fn nentries(&self) -> usize

Number of vocabulary entries learned during fitting

source

pub fn transform<T: ToString, D: Data<Elem = T>>( &self, x: &ArrayBase<D, Ix1> ) -> CsMat<usize>

Given a sequence of n documents, produces a sparse array of size (n, vocabulary_entries) where column j of row i is the number of occurrences of vocabulary entry j in the document of index i. Vocabulary entry j is the string at the j-th position in the vocabulary. If a vocabulary entry was not encountered in a document, then the relative cell in the sparse matrix will be set to None.

source

pub fn transform_files<P: AsRef<Path>>( &self, input: &[P], encoding: EncodingRef, trap: DecoderTrap ) -> CsMat<usize>

Given a sequence of n file names, produces a sparse array of size (n, vocabulary_entries) where column j of row i is the number of occurrences of vocabulary entry j in the document contained in the file of index i. Vocabulary entry j is the string at the j-th position in the vocabulary. If a vocabulary entry was not encountered in a document, then the relative cell in the sparse matrix will be set to None.

The files will be read using the specified encoding, and any sequence unrecognized by the encoding will be handled according to trap.

source

pub fn vocabulary(&self) -> &Vec<String>

Contains all vocabulary entries, in the same order used by the transform methods.

Trait Implementations§

source§

impl Clone for CountVectorizer

source§

fn clone(&self) -> CountVectorizer

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for CountVectorizer

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V