Crate linfa_preprocessing

Expand description

Preprocessing

The Big Picture

linfa-preprocessing is a crate in the linfa ecosystem, an effort to create a toolkit for classical Machine Learning implemented in pure Rust, akin to Python’s scikit-learn.

Current state

linfa-preprocessing provides a pure Rust implementation of:

Standard scaling
Min-max scaling
Max Abs Scaling
Normalization (l1, l2 and max norm)
Count vectorization
Term frequency - inverse document frequency count vectorization
Whitening

Re-exports

pub use error::PreprocessingError;

pub use error::Result;

Modules

error

Error definitions for preprocessing

linear_scaling

Linear Scaling methods

norm_scaling

Sample normalization methods

tf_idf_vectorization

Term frequency - inverse document frequency vectorization methods

whitening

Methods for uncorrelating data

Structs

CountVectorizer

Counts the occurrences of each vocabulary entry, learned during fitting, in a sequence of documents. Each vocabulary entry is mapped to an integer value that is used to index the count in the result.

CountVectorizerParams

CountVectorizerValidParams

Count vectorizer: learns a vocabulary from a sequence of documents (or file paths) and maps each vocabulary entry to an integer value, producing a FittedCountVectorizer that can be used to count the occurrences of each vocabulary entry in any sequence of documents. Alternatively a user-specified vocabulary can be used for fitting.