Expand description
§ferrolearn-preprocess
Data preprocessing transformers for the ferrolearn machine learning framework.
This crate provides standard scalers, encoders, imputers, and feature
selection utilities that follow the ferrolearn Fit/Transform trait
pattern.
§Scalers
All scalers are generic over F: Float + Send + Sync + 'static and implement
Fit<Array2<F>, ()> (returning a Fitted* type) and
FitTransform<Array2<F>>. The fitted types
implement Transform<Array2<F>>.
StandardScaler— zero-mean, unit-variance scalingMinMaxScaler— scale features to a given range (default[0, 1])RobustScaler— median / IQR-based scaling, robust to outliersMaxAbsScaler— scale by maximum absolute value so values are in[-1, 1]normalizer::Normalizer— normalize each sample (row) to unit normpower_transformer::PowerTransformer— Yeo-Johnson power transform
§Encoders
OneHotEncoder— encodeArray2<usize>categorical columns as binary columnsLabelEncoder— mapArray1<String>labels to integer indicesordinal_encoder::OrdinalEncoder— map string categories to integers in order of first appearance
§Imputers
imputer::SimpleImputer— fill missing (NaN) values per feature column using Mean, Median, MostFrequent, or Constant strategy.
§Feature Selection
feature_selection::VarianceThreshold— remove features with variance below a configurable threshold.feature_selection::SelectKBest— keep the K features with the highest ANOVA F-scores against class labels.feature_selection::SelectFromModel— keep features whose importance weight (from a pre-fitted model) meets a configurable threshold.
§Feature Engineering
polynomial_features::PolynomialFeatures— generate polynomial and interaction featuresbinarizer::Binarizer— threshold features to binary valuesfunction_transformer::FunctionTransformer— apply a user-provided function element-wise
§Pipeline Integration
StandardScaler<f64>, MinMaxScaler<f64>, RobustScaler<f64>,
MaxAbsScaler<f64>, Normalizer<f64>, PowerTransformer<f64>,
PolynomialFeatures<f64>, SimpleImputer<f64>, VarianceThreshold<f64>,
SelectKBest<f64>, and SelectFromModel<f64> each implement
PipelineTransformer
so they can be used as steps inside a
Pipeline.
§Examples
use ferrolearn_preprocess::StandardScaler;
use ferrolearn_core::traits::FitTransform;
use ndarray::array;
let x = array![[1.0_f64, 10.0], [2.0, 20.0], [3.0, 30.0]];
let scaled = StandardScaler::<f64>::new().fit_transform(&x).unwrap();
// scaled columns now have mean ≈ 0 and std ≈ 1Re-exports§
pub use binarizer::Binarizer;pub use column_transformer::ColumnSelector;pub use column_transformer::ColumnTransformer;pub use column_transformer::FittedColumnTransformer;pub use column_transformer::Remainder;pub use column_transformer::make_column_transformer;pub use feature_selection::FittedSelectKBest;pub use feature_selection::FittedVarianceThreshold;pub use feature_selection::ScoreFunc;pub use feature_selection::SelectFromModel;pub use feature_selection::SelectKBest;pub use feature_selection::VarianceThreshold;pub use function_transformer::FunctionTransformer;pub use imputer::FittedSimpleImputer;pub use imputer::ImputeStrategy;pub use imputer::SimpleImputer;pub use label_encoder::FittedLabelEncoder;pub use label_encoder::LabelEncoder;pub use max_abs_scaler::FittedMaxAbsScaler;pub use max_abs_scaler::MaxAbsScaler;pub use min_max_scaler::FittedMinMaxScaler;pub use min_max_scaler::MinMaxScaler;pub use normalizer::Normalizer;pub use one_hot_encoder::FittedOneHotEncoder;pub use one_hot_encoder::OneHotEncoder;pub use ordinal_encoder::FittedOrdinalEncoder;pub use ordinal_encoder::OrdinalEncoder;pub use polynomial_features::PolynomialFeatures;pub use power_transformer::FittedPowerTransformer;pub use power_transformer::PowerTransformer;pub use robust_scaler::FittedRobustScaler;pub use robust_scaler::RobustScaler;pub use standard_scaler::FittedStandardScaler;pub use standard_scaler::StandardScaler;pub use binary_encoder::BinaryEncoder;pub use binary_encoder::FittedBinaryEncoder;pub use iterative_imputer::FittedIterativeImputer;pub use iterative_imputer::InitialStrategy;pub use iterative_imputer::IterativeImputer;pub use kbins_discretizer::BinEncoding;pub use kbins_discretizer::BinStrategy;pub use kbins_discretizer::FittedKBinsDiscretizer;pub use kbins_discretizer::KBinsDiscretizer;pub use knn_imputer::FittedKNNImputer;pub use knn_imputer::KNNImputer;pub use knn_imputer::KNNWeights;pub use quantile_transformer::FittedQuantileTransformer;pub use quantile_transformer::OutputDistribution;pub use quantile_transformer::QuantileTransformer;pub use rfe::RFE;pub use rfe::RFECV;pub use select_percentile::FittedSelectPercentile;pub use select_percentile::SelectPercentile;pub use spline_transformer::FittedSplineTransformer;pub use spline_transformer::KnotStrategy;pub use spline_transformer::SplineTransformer;pub use target_encoder::FittedTargetEncoder;pub use target_encoder::TargetEncoder;
Modules§
- binarizer
- Binarizer: threshold features to binary values.
- binary_
encoder - Binary encoder: encode categorical integers as binary digits.
- column_
transformer - Column transformer: apply different transformers to different column subsets.
- feature_
selection - Feature selection transformers.
- function_
transformer - Function transformer: apply a user-provided function element-wise.
- imputer
- Simple imputer: fill missing (NaN) values per feature column.
- iterative_
imputer - Iterative imputer: fill missing values by modeling each feature as a function of all other features.
- kbins_
discretizer - K-bins discretizer: bin continuous features into discrete intervals.
- knn_
imputer - KNN imputer: fill missing (NaN) values using K-nearest neighbors.
- label_
encoder - Label encoder: maps string labels to integer indices.
- max_
abs_ scaler - Max-absolute scaler: scale each feature by its maximum absolute value.
- min_
max_ scaler - Min-max scaler: scales each feature to a given range.
- normalizer
- Normalizer: scale each sample (row) to unit norm.
- one_
hot_ encoder - One-hot encoder for categorical integer features.
- ordinal_
encoder - Ordinal encoder: map string categories to integer indices.
- polynomial_
features - Polynomial features: generate polynomial and interaction features.
- power_
transformer - Power transformer: apply a power transform to make data more Gaussian.
- quantile_
transformer - Quantile transformer: map features to a uniform or normal distribution.
- rfe
- Recursive Feature Elimination (RFE) and RFE with Cross-Validation (RFECV).
- robust_
scaler - Robust scaler: median and IQR-based scaling.
- select_
percentile - Select features by percentile of highest scores.
- spline_
transformer - Spline transformer: generate B-spline basis functions for each feature.
- standard_
scaler - Standard scaler: zero-mean, unit-variance scaling.
- target_
encoder - Target encoder: encode categorical features using target statistics.