Skip to main content

Module preprocessing

Module preprocessing 

Source
Expand description

Streaming preprocessing utilities for feature transformation.

These transformers process features incrementally, maintaining running statistics that update with each sample – no batch recomputation needed.

§Modules

ModulePurpose
normalizerWelford-based online standardization (zero-mean, unit-variance)
feature_selectorEWMA importance tracking with dynamic feature masking
ccipcaCandid Covariance-free Incremental PCA – streaming dimensionality reduction
feature_hasherFeature hashing (hashing trick) for fixed-size dimensionality reduction
min_maxStreaming min-max scaler for feature normalization to a target range
one_hotStreaming one-hot encoder with online category discovery
target_encoderStreaming target encoder with Bayesian smoothing
polynomialPolynomial and interaction feature generation

§Example

use irithyll::preprocessing::{IncrementalNormalizer, OnlineFeatureSelector};

let mut norm = IncrementalNormalizer::new();
let standardized = norm.update_and_transform(&[100.0, 0.5, -3.0]);

let mut selector = OnlineFeatureSelector::new(3, 0.5, 0.1, 10);
selector.update_importances(&[0.9, 0.1, 0.8]);
let masked = selector.mask_features(&standardized);

Re-exports§

pub use ccipca::CCIPCA;
pub use feature_hasher::FeatureHasher;
pub use feature_selector::OnlineFeatureSelector;
pub use min_max::MinMaxScaler;
pub use normalizer::IncrementalNormalizer;
pub use one_hot::OneHotEncoder;
pub use polynomial::PolynomialFeatures;
pub use target_encoder::TargetEncoder;
pub use crate::pipeline::StreamingPreprocessor;

Modules§

ccipca
Candid Covariance-free Incremental PCA (CCIPCA) for streaming dimensionality reduction.
feature_hasher
Feature hashing (hashing trick) for dimensionality reduction.
feature_selector
EWMA-based online feature selector with dynamic importance masking.
min_max
Streaming min-max scaler for feature normalization.
normalizer
Welford online mean/variance normalizer for incremental standardization.
one_hot
Streaming one-hot encoder for categorical features.
polynomial
Degree-2 polynomial feature generation for interaction modeling.
target_encoder
Streaming target encoder for categorical features with Bayesian smoothing.