Expand description
ยงSciRS2 Transform - Data Transformation and Preprocessing
scirs2-transform provides comprehensive data transformation utilities for machine learning, offering normalization, feature engineering, dimensionality reduction, encoding, imputation, and pipelines with SIMD acceleration and out-of-core processing for large datasets.
ยง๐ฏ Key Features
- Normalization: Min-max, Z-score, robust scaling, quantile normalization
- Feature Engineering: Polynomial features, interaction terms, binning
- Dimensionality Reduction: PCA, SVD, t-SNE, UMAP, LDA
- Encoding: One-hot, label, ordinal, target encoding
- Imputation: Mean, median, mode, KNN, iterative imputation
- Pipelines: Chained transformations with fit/transform API
- Performance: SIMD operations, streaming, out-of-core processing
ยง๐ฆ Module Overview
| SciRS2 Module | scikit-learn Equivalent | Description |
|---|---|---|
normalize | sklearn.preprocessing.StandardScaler | Data normalization/standardization |
features | sklearn.preprocessing.PolynomialFeatures | Feature engineering |
reduction | sklearn.decomposition.PCA | Dimensionality reduction |
encoding | sklearn.preprocessing.OneHotEncoder | Categorical encoding |
impute | sklearn.impute.SimpleImputer | Missing value imputation |
pipeline | sklearn.pipeline.Pipeline | Transformation pipelines |
ยง๐ Quick Start
[dependencies]
scirs2-transform = "0.1.5"use scirs2_transform::normalize::{normalize_array, NormalizationMethod};
use scirs2_core::ndarray::Array2;
// Standardize data (Z-score normalization)
let data = Array2::<f64>::zeros((100, 5));
let normalized = normalize_array(&data, NormalizationMethod::ZScore, 0).expect("should succeed");ยง๐ Version: 0.1.5 (January 15, 2026)
Re-exportsยง
pub use decomposition::DictionaryLearning;pub use decomposition::NMF;pub use encoding::BinaryEncoder;pub use encoding::EncodedOutput;pub use encoding::FrequencyEncoder;pub use encoding::OneHotEncoder;pub use encoding::OrdinalEncoder;pub use encoding::SparseMatrix;pub use encoding::TargetEncoder;pub use encoding::WOEEncoder;pub use error::Result;pub use error::TransformError;pub use features::binarize;pub use features::discretize_equal_frequency;pub use features::discretize_equal_width;pub use features::log_transform;pub use features::power_transform;pub use features::PolynomialFeatures;pub use features::PowerTransformer;pub use impute::DistanceMetric;pub use impute::ImputeStrategy;pub use impute::IterativeImputer;pub use impute::KNNImputer;pub use impute::MissingIndicator;pub use impute::SimpleImputer;pub use impute::WeightingScheme;pub use normalize::normalize_array;pub use normalize::normalize_vector;pub use normalize::NormalizationMethod;pub use normalize::Normalizer;pub use pipeline::make_column_transformer;pub use pipeline::make_pipeline;pub use pipeline::ColumnTransformer;pub use pipeline::Pipeline;pub use pipeline::RemainderOption;pub use pipeline::Transformer;pub use reduction::factor_analysis;pub use reduction::factor_analysis;pub use reduction::scree_plot_data;pub use reduction::trustworthiness;pub use reduction::AffinityMethod;pub use reduction::DiffusionMaps;pub use reduction::FactorAnalysis;pub use reduction::FactorAnalysisResult;pub use reduction::GraphMethod;pub use reduction::Isomap;pub use reduction::LaplacianEigenmaps;pub use reduction::RotationMethod;pub use reduction::ScreePlotData;pub use reduction::SpectralEmbedding;pub use reduction::TruncatedSVD;pub use reduction::LDA;pub use reduction::LLE;pub use reduction::PCA;pub use reduction::TSNE;pub use reduction::UMAP;pub use scaling::MaxAbsScaler;pub use scaling::QuantileTransformer;pub use selection::MutualInfoSelector;pub use selection::RecursiveFeatureElimination;pub use selection::VarianceThreshold;pub use time_series::FourierFeatures;pub use time_series::LagFeatures;pub use time_series::TimeSeriesFeatures;pub use time_series::WaveletFeatures;pub use normalize_simd::simd_l2_normalize_1d;pub use normalize_simd::simd_maxabs_normalize_1d;pub use normalize_simd::simd_minmax_normalize_1d;pub use normalize_simd::simd_normalize_adaptive;pub use normalize_simd::simd_normalize_batch;pub use normalize_simd::simd_normalizearray;pub use normalize_simd::simd_zscore_normalize_1d;pub use normalize_simd::AdaptiveBlockSizer;pub use features_simd::simd_binarize;pub use features_simd::simd_polynomial_features_optimized;pub use features_simd::simd_power_transform;pub use features_simd::SimdPolynomialFeatures;pub use scaling_simd::SimdMaxAbsScaler;pub use scaling_simd::SimdRobustScaler;pub use scaling_simd::SimdStandardScaler;pub use graph::adjacency_to_edge_list;pub use graph::edge_list_to_adjacency;pub use graph::ActivationType;pub use graph::DeepWalk;pub use graph::GraphAutoencoder;pub use graph::LaplacianType;pub use graph::Node2Vec;pub use image::resize_images;pub use image::rgb_to_grayscale;pub use image::BlockNorm;pub use image::HOGDescriptor;pub use image::ImageNormMethod;pub use image::ImageNormalizer;pub use image::PatchExtractor;pub use optimization_config::AdaptiveParameterTuner;pub use optimization_config::AdvancedConfigOptimizer;pub use optimization_config::AutoTuner;pub use optimization_config::ConfigurationPredictor;pub use optimization_config::DataCharacteristics;pub use optimization_config::OptimizationConfig;pub use optimization_config::OptimizationReport;pub use optimization_config::PerformanceMetric;pub use optimization_config::SystemMonitor;pub use optimization_config::SystemResources;pub use optimization_config::TransformationRecommendation;pub use out_of_core::csv_chunks;pub use out_of_core::ChunkedArrayReader;pub use out_of_core::ChunkedArrayWriter;pub use out_of_core::OutOfCoreConfig;pub use out_of_core::OutOfCoreNormalizer;pub use out_of_core::OutOfCoreTransformer;pub use performance::EnhancedPCA;pub use performance::EnhancedStandardScaler;pub use streaming::OutlierMethod;pub use streaming::StreamingFeatureSelector;pub use streaming::StreamingMinMaxScaler;pub use streaming::StreamingOutlierDetector;pub use streaming::StreamingPCA;pub use streaming::StreamingQuantileTracker;pub use streaming::StreamingStandardScaler;pub use streaming::StreamingTransformer;pub use streaming::WindowedStreamingTransformer;pub use text::CountVectorizer;pub use text::HashingVectorizer;pub use text::StreamingCountVectorizer;pub use text::TfidfVectorizer;pub use utils::ArrayMemoryPool;pub use utils::DataChunker;pub use utils::PerfUtils;pub use utils::ProcessingStrategy;pub use utils::StatUtils;pub use utils::TypeConverter;pub use utils::ValidationUtils;pub use signal_transforms::cqt::CQTConfig;pub use signal_transforms::cqt::Chromagram;pub use signal_transforms::cqt::WindowFunction;pub use signal_transforms::cqt::CQT;pub use signal_transforms::cwt::ComplexMorletWavelet;pub use signal_transforms::cwt::ContinuousWavelet;pub use signal_transforms::cwt::GaussianWavelet;pub use signal_transforms::cwt::MexicanHatWavelet;pub use signal_transforms::cwt::MorletWavelet;pub use signal_transforms::cwt::CWT;pub use signal_transforms::dwt::BoundaryMode;pub use signal_transforms::dwt::Dwt2dCoeffs;pub use signal_transforms::dwt::WaveletFilters;pub use signal_transforms::dwt::WaveletType;pub use signal_transforms::dwt::DWT;pub use signal_transforms::dwt::DWT2D;pub use signal_transforms::dwt::DWTN;pub use signal_transforms::mfcc::MFCCConfig;pub use signal_transforms::mfcc::MelFilterbank;pub use signal_transforms::mfcc::MFCC;pub use signal_transforms::stft::PaddingMode;pub use signal_transforms::stft::STFTConfig;pub use signal_transforms::stft::Spectrogram;pub use signal_transforms::stft::SpectrogramScaling;pub use signal_transforms::stft::WindowType;pub use signal_transforms::stft::STFT;pub use signal_transforms::wpt::denoise_wpt;pub use signal_transforms::wpt::BestBasisCriterion;pub use signal_transforms::wpt::WaveletPacketNode;pub use signal_transforms::wpt::WPT;pub use gpu::GpuMatrixOps;pub use gpu::GpuPCA;pub use gpu::GpuTSNE;pub use distributed::AutoScalingConfig;pub use distributed::CircuitBreaker;pub use distributed::ClusterHealthSummary;pub use distributed::DistributedConfig;pub use distributed::DistributedCoordinator;pub use distributed::DistributedPCA;pub use distributed::EnhancedDistributedCoordinator;pub use distributed::NodeHealth;pub use distributed::NodeInfo;pub use distributed::NodeStatus;pub use distributed::PartitioningStrategy;pub use auto_feature_engineering::AdvancedMetaLearningSystem;pub use auto_feature_engineering::AutoFeatureEngineer;pub use auto_feature_engineering::DatasetMetaFeatures;pub use auto_feature_engineering::EnhancedMetaFeatures;pub use auto_feature_engineering::MultiObjectiveRecommendation;pub use auto_feature_engineering::TransformationConfig;pub use auto_feature_engineering::TransformationType;pub use quantum_optimization::AdvancedQuantumMetrics;pub use quantum_optimization::AdvancedQuantumOptimizer;pub use quantum_optimization::AdvancedQuantumParams;pub use quantum_optimization::QuantumHyperparameterTuner;pub use quantum_optimization::QuantumInspiredOptimizer;pub use quantum_optimization::QuantumParticle;pub use quantum_optimization::QuantumTransformationOptimizer;pub use neuromorphic_adaptation::AdvancedNeuromorphicMetrics;pub use neuromorphic_adaptation::AdvancedNeuromorphicProcessor;pub use neuromorphic_adaptation::NeuromorphicAdaptationNetwork;pub use neuromorphic_adaptation::NeuromorphicMemorySystem;pub use neuromorphic_adaptation::NeuromorphicTransformationSystem;pub use neuromorphic_adaptation::SpikingNeuron;pub use neuromorphic_adaptation::SystemState;pub use neuromorphic_adaptation::TransformationEpisode;pub use kernel::center_kernel_matrix;pub use kernel::cross_gram_matrix;pub use kernel::estimate_rbf_gamma;pub use kernel::gram_matrix;pub use kernel::is_positive_semidefinite;pub use kernel::kernel_alignment;pub use kernel::kernel_diagonal;pub use kernel::kernel_eval;pub use kernel::KernelPCA;pub use kernel::KernelRidgeRegression;pub use kernel::KernelType;pub use monitoring::AlertConfig;pub use monitoring::AlertType;pub use monitoring::DriftDetectionResult;pub use monitoring::DriftMethod;pub use monitoring::PerformanceMetrics;pub use monitoring::TransformationMonitor;
Modulesยง
- auto_
feature_ engineering - Automated feature engineering with meta-learning Auto-generated module structure
- decomposition
- Matrix decomposition techniques Matrix decomposition techniques
- distributed
- Distributed processing for multi-node transformations Distributed processing for multi-node transformation pipelines
- encoding
- Categorical data encoding utilities Categorical data encoding utilities
- error
- Error handling for the transformation module Error types for the data transformation module
- features
- Feature engineering techniques Feature engineering utilities
- features_
simd - SIMD-accelerated feature engineering operations SIMD-accelerated feature engineering operations
- gpu
- GPU-accelerated transformations GPU-accelerated transformations
- graph
- Graph embedding transformers Graph embedding transformers for graph-based feature extraction
- image
- Image processing transformers Image processing transformers for feature extraction
- impute
- Missing value imputation utilities Missing value imputation utilities
- kernel
- Kernel methods (Kernel PCA, Kernel Ridge Regression, kernel functions) Kernel Methods
- monitoring
- Production monitoring with drift detection Production monitoring with drift detection and model degradation alerts
- neuromorphic_
adaptation - Neuromorphic computing integration for real-time adaptation Neuromorphic computing integration for real-time transformation adaptation
- normalize
- Basic normalization methods for data Data normalization and standardization utilities
- normalize_
simd - SIMD-accelerated normalization operations SIMD-accelerated normalization operations
- optimization_
config - Optimization configuration and auto-tuning system Optimization configuration and auto-tuning system
- out_
of_ core - Out-of-core processing for large datasets Out-of-core processing for large datasets
- performance
- Performance optimizations and enhanced implementations Performance optimizations and enhanced implementations
- pipeline
- Pipeline API for chaining transformations Pipeline API for chaining transformations
- quantum_
optimization - Quantum-inspired optimization for data transformations Quantum-inspired optimization for data transformations
- reduction
- Dimensionality reduction algorithms Dimensionality reduction techniques
- scaling
- Advanced scaling and transformation methods Advanced scaling and transformation methods
- scaling_
simd - SIMD-accelerated scaling operations SIMD-accelerated scaling operations
- selection
- Feature selection utilities Feature selection utilities
- signal_
transforms - Signal transforms (DWT, CWT, WPT, STFT, MFCC, CQT, Chromagram) Signal Transform Module for v0.2.0
- streaming
- Streaming transformations for continuous data Streaming transformations for continuous data processing
- text
- Text processing transformers Text processing transformers for feature extraction
- time_
series - Time series feature extraction Time series feature extraction
- utils
- Utility functions and helpers for data transformation Utility functions and helpers for data transformation