Skip to main content

Crate ferrolearn_preprocess

Crate ferrolearn_preprocess 

Source
Expand description

§ferrolearn-preprocess

Data preprocessing transformers for the ferrolearn machine learning framework.

This crate provides standard scalers, encoders, imputers, and feature selection utilities that follow the ferrolearn Fit/Transform trait pattern.

§Scalers

All scalers are generic over F: Float + Send + Sync + 'static and implement Fit<Array2<F>, ()> (returning a Fitted* type) and FitTransform<Array2<F>>. The fitted types implement Transform<Array2<F>>.

§Encoders

§Imputers

  • imputer::SimpleImputer — fill missing (NaN) values per feature column using Mean, Median, MostFrequent, or Constant strategy.

§Feature Selection

§Feature Engineering

§Pipeline Integration

StandardScaler<f64>, MinMaxScaler<f64>, RobustScaler<f64>, MaxAbsScaler<f64>, Normalizer<f64>, PowerTransformer<f64>, PolynomialFeatures<f64>, SimpleImputer<f64>, VarianceThreshold<f64>, SelectKBest<f64>, and SelectFromModel<f64> each implement PipelineTransformer so they can be used as steps inside a Pipeline.

§Examples

use ferrolearn_preprocess::StandardScaler;
use ferrolearn_core::traits::FitTransform;
use ndarray::array;

let x = array![[1.0_f64, 10.0], [2.0, 20.0], [3.0, 30.0]];
let scaled = StandardScaler::<f64>::new().fit_transform(&x).unwrap();
// scaled columns now have mean ≈ 0 and std ≈ 1

Re-exports§

pub use binarizer::Binarizer;
pub use column_transformer::ColumnSelector;
pub use column_transformer::ColumnTransformer;
pub use column_transformer::FittedColumnTransformer;
pub use column_transformer::Remainder;
pub use column_transformer::make_column_transformer;
pub use feature_selection::FittedSelectKBest;
pub use feature_selection::FittedVarianceThreshold;
pub use feature_selection::ScoreFunc;
pub use feature_selection::SelectFromModel;
pub use feature_selection::SelectKBest;
pub use feature_selection::VarianceThreshold;
pub use function_transformer::FunctionTransformer;
pub use imputer::FittedSimpleImputer;
pub use imputer::ImputeStrategy;
pub use imputer::SimpleImputer;
pub use label_encoder::FittedLabelEncoder;
pub use label_encoder::LabelEncoder;
pub use max_abs_scaler::FittedMaxAbsScaler;
pub use max_abs_scaler::MaxAbsScaler;
pub use min_max_scaler::FittedMinMaxScaler;
pub use min_max_scaler::MinMaxScaler;
pub use normalizer::Normalizer;
pub use one_hot_encoder::FittedOneHotEncoder;
pub use one_hot_encoder::OneHotEncoder;
pub use ordinal_encoder::FittedOrdinalEncoder;
pub use ordinal_encoder::OrdinalEncoder;
pub use polynomial_features::PolynomialFeatures;
pub use power_transformer::FittedPowerTransformer;
pub use power_transformer::PowerTransformer;
pub use robust_scaler::FittedRobustScaler;
pub use robust_scaler::RobustScaler;
pub use standard_scaler::FittedStandardScaler;
pub use standard_scaler::StandardScaler;
pub use binary_encoder::BinaryEncoder;
pub use binary_encoder::FittedBinaryEncoder;
pub use iterative_imputer::FittedIterativeImputer;
pub use iterative_imputer::InitialStrategy;
pub use iterative_imputer::IterativeImputer;
pub use kbins_discretizer::BinEncoding;
pub use kbins_discretizer::BinStrategy;
pub use kbins_discretizer::FittedKBinsDiscretizer;
pub use kbins_discretizer::KBinsDiscretizer;
pub use knn_imputer::FittedKNNImputer;
pub use knn_imputer::KNNImputer;
pub use knn_imputer::KNNWeights;
pub use quantile_transformer::FittedQuantileTransformer;
pub use quantile_transformer::OutputDistribution;
pub use quantile_transformer::QuantileTransformer;
pub use rfe::RFE;
pub use rfe::RFECV;
pub use select_percentile::FittedSelectPercentile;
pub use select_percentile::SelectPercentile;
pub use spline_transformer::FittedSplineTransformer;
pub use spline_transformer::KnotStrategy;
pub use spline_transformer::SplineTransformer;
pub use target_encoder::FittedTargetEncoder;
pub use target_encoder::TargetEncoder;

Modules§

binarizer
Binarizer: threshold features to binary values.
binary_encoder
Binary encoder: encode categorical integers as binary digits.
column_transformer
Column transformer: apply different transformers to different column subsets.
feature_selection
Feature selection transformers.
function_transformer
Function transformer: apply a user-provided function element-wise.
imputer
Simple imputer: fill missing (NaN) values per feature column.
iterative_imputer
Iterative imputer: fill missing values by modeling each feature as a function of all other features.
kbins_discretizer
K-bins discretizer: bin continuous features into discrete intervals.
knn_imputer
KNN imputer: fill missing (NaN) values using K-nearest neighbors.
label_encoder
Label encoder: maps string labels to integer indices.
max_abs_scaler
Max-absolute scaler: scale each feature by its maximum absolute value.
min_max_scaler
Min-max scaler: scales each feature to a given range.
normalizer
Normalizer: scale each sample (row) to unit norm.
one_hot_encoder
One-hot encoder for categorical integer features.
ordinal_encoder
Ordinal encoder: map string categories to integer indices.
polynomial_features
Polynomial features: generate polynomial and interaction features.
power_transformer
Power transformer: apply a power transform to make data more Gaussian.
quantile_transformer
Quantile transformer: map features to a uniform or normal distribution.
rfe
Recursive Feature Elimination (RFE) and RFE with Cross-Validation (RFECV).
robust_scaler
Robust scaler: median and IQR-based scaling.
select_percentile
Select features by percentile of highest scores.
spline_transformer
Spline transformer: generate B-spline basis functions for each feature.
standard_scaler
Standard scaler: zero-mean, unit-variance scaling.
target_encoder
Target encoder: encode categorical features using target statistics.