Expand description
ยงSciRS2 Transform - Data Transformation and Preprocessing
scirs2-transform provides comprehensive data transformation utilities for machine learning, offering normalization, feature engineering, dimensionality reduction, encoding, imputation, and pipelines with SIMD acceleration and out-of-core processing for large datasets.
ยง๐ฏ Key Features
- Normalization: Min-max, Z-score, robust scaling, quantile normalization
- Feature Engineering: Polynomial features, interaction terms, binning
- Dimensionality Reduction: PCA, SVD, t-SNE, UMAP, LDA
- Encoding: One-hot, label, ordinal, target encoding
- Imputation: Mean, median, mode, KNN, iterative imputation
- Pipelines: Chained transformations with fit/transform API
- Performance: SIMD operations, streaming, out-of-core processing
ยง๐ฆ Module Overview
| SciRS2 Module | scikit-learn Equivalent | Description |
|---|---|---|
normalize | sklearn.preprocessing.StandardScaler | Data normalization/standardization |
features | sklearn.preprocessing.PolynomialFeatures | Feature engineering |
reduction | sklearn.decomposition.PCA | Dimensionality reduction |
encoding | sklearn.preprocessing.OneHotEncoder | Categorical encoding |
impute | sklearn.impute.SimpleImputer | Missing value imputation |
pipeline | sklearn.pipeline.Pipeline | Transformation pipelines |
ยง๐ Quick Start
[dependencies]
scirs2-transform = "0.1.0-rc.2"use scirs2_transform::normalize::{normalize_array, NormalizationMethod};
use scirs2_core::ndarray::Array2;
// Standardize data (Z-score normalization)
let data = Array2::<f64>::zeros((100, 5));
let normalized = normalize_array(&data, NormalizationMethod::ZScore, 0).unwrap();ยง๐ Version: 0.1.0-rc.2 (October 03, 2025)
Re-exportsยง
pub use decomposition::DictionaryLearning;pub use decomposition::NMF;pub use encoding::BinaryEncoder;pub use encoding::EncodedOutput;pub use encoding::FrequencyEncoder;pub use encoding::OneHotEncoder;pub use encoding::OrdinalEncoder;pub use encoding::SparseMatrix;pub use encoding::TargetEncoder;pub use encoding::WOEEncoder;pub use error::Result;pub use error::TransformError;pub use features::binarize;pub use features::discretize_equal_frequency;pub use features::discretize_equal_width;pub use features::log_transform;pub use features::power_transform;pub use features::PolynomialFeatures;pub use features::PowerTransformer;pub use impute::DistanceMetric;pub use impute::ImputeStrategy;pub use impute::IterativeImputer;pub use impute::KNNImputer;pub use impute::MissingIndicator;pub use impute::SimpleImputer;pub use impute::WeightingScheme;pub use normalize::normalize_array;pub use normalize::normalize_vector;pub use normalize::NormalizationMethod;pub use normalize::Normalizer;pub use pipeline::make_column_transformer;pub use pipeline::make_pipeline;pub use pipeline::ColumnTransformer;pub use pipeline::Pipeline;pub use pipeline::RemainderOption;pub use pipeline::Transformer;pub use reduction::trustworthiness;pub use reduction::AffinityMethod;pub use reduction::Isomap;pub use reduction::SpectralEmbedding;pub use reduction::TruncatedSVD;pub use reduction::LDA;pub use reduction::LLE;pub use reduction::PCA;pub use reduction::TSNE;pub use reduction::UMAP;pub use scaling::MaxAbsScaler;pub use scaling::QuantileTransformer;pub use selection::MutualInfoSelector;pub use selection::RecursiveFeatureElimination;pub use selection::VarianceThreshold;pub use time_series::FourierFeatures;pub use time_series::LagFeatures;pub use time_series::TimeSeriesFeatures;pub use time_series::WaveletFeatures;pub use graph::adjacency_to_edge_list;pub use graph::edge_list_to_adjacency;pub use graph::ActivationType;pub use graph::DeepWalk;pub use graph::GraphAutoencoder;pub use graph::LaplacianType;pub use graph::Node2Vec;pub use image::resize_images;pub use image::rgb_to_grayscale;pub use image::BlockNorm;pub use image::HOGDescriptor;pub use image::ImageNormMethod;pub use image::ImageNormalizer;pub use image::PatchExtractor;pub use optimization_config::AdaptiveParameterTuner;pub use optimization_config::AdvancedConfigOptimizer;pub use optimization_config::AutoTuner;pub use optimization_config::ConfigurationPredictor;pub use optimization_config::DataCharacteristics;pub use optimization_config::OptimizationConfig;pub use optimization_config::OptimizationReport;pub use optimization_config::PerformanceMetric;pub use optimization_config::SystemMonitor;pub use optimization_config::SystemResources;pub use optimization_config::TransformationRecommendation;pub use out_of_core::csv_chunks;pub use out_of_core::ChunkedArrayReader;pub use out_of_core::ChunkedArrayWriter;pub use out_of_core::OutOfCoreConfig;pub use out_of_core::OutOfCoreNormalizer;pub use out_of_core::OutOfCoreTransformer;pub use performance::EnhancedPCA;pub use performance::EnhancedStandardScaler;pub use streaming::OutlierMethod;pub use streaming::StreamingFeatureSelector;pub use streaming::StreamingMinMaxScaler;pub use streaming::StreamingOutlierDetector;pub use streaming::StreamingPCA;pub use streaming::StreamingQuantileTracker;pub use streaming::StreamingStandardScaler;pub use streaming::StreamingTransformer;pub use streaming::WindowedStreamingTransformer;pub use text::CountVectorizer;pub use text::HashingVectorizer;pub use text::StreamingCountVectorizer;pub use text::TfidfVectorizer;pub use utils::ArrayMemoryPool;pub use utils::DataChunker;pub use utils::PerfUtils;pub use utils::ProcessingStrategy;pub use utils::StatUtils;pub use utils::TypeConverter;pub use utils::ValidationUtils;pub use auto_feature_engineering::AdvancedMetaLearningSystem;pub use auto_feature_engineering::AutoFeatureEngineer;pub use auto_feature_engineering::DatasetMetaFeatures;pub use auto_feature_engineering::EnhancedMetaFeatures;pub use auto_feature_engineering::MultiObjectiveRecommendation;pub use auto_feature_engineering::TransformationConfig;pub use auto_feature_engineering::TransformationType;pub use quantum_optimization::AdvancedQuantumMetrics;pub use quantum_optimization::AdvancedQuantumOptimizer;pub use quantum_optimization::AdvancedQuantumParams;pub use quantum_optimization::QuantumHyperparameterTuner;pub use quantum_optimization::QuantumInspiredOptimizer;pub use quantum_optimization::QuantumParticle;pub use quantum_optimization::QuantumTransformationOptimizer;pub use neuromorphic_adaptation::AdvancedNeuromorphicMetrics;pub use neuromorphic_adaptation::AdvancedNeuromorphicProcessor;pub use neuromorphic_adaptation::NeuromorphicAdaptationNetwork;pub use neuromorphic_adaptation::NeuromorphicMemorySystem;pub use neuromorphic_adaptation::NeuromorphicTransformationSystem;pub use neuromorphic_adaptation::SpikingNeuron;pub use neuromorphic_adaptation::SystemState;pub use neuromorphic_adaptation::TransformationEpisode;
Modulesยง
- auto_
feature_ engineering - Automated feature engineering with meta-learning Automated feature engineering with meta-learning
- decomposition
- Matrix decomposition techniques Matrix decomposition techniques
- encoding
- Categorical data encoding utilities Categorical data encoding utilities
- error
- Error handling for the transformation module Error types for the data transformation module
- features
- Feature engineering techniques Feature engineering utilities
- graph
- Graph embedding transformers Graph embedding transformers for graph-based feature extraction
- image
- Image processing transformers Image processing transformers for feature extraction
- impute
- Missing value imputation utilities Missing value imputation utilities
- neuromorphic_
adaptation - Neuromorphic computing integration for real-time adaptation Neuromorphic computing integration for real-time transformation adaptation
- normalize
- Basic normalization methods for data Data normalization and standardization utilities
- optimization_
config - Optimization configuration and auto-tuning system Optimization configuration and auto-tuning system
- out_
of_ core - Out-of-core processing for large datasets Out-of-core processing for large datasets
- performance
- Performance optimizations and enhanced implementations Performance optimizations and enhanced implementations
- pipeline
- Pipeline API for chaining transformations Pipeline API for chaining transformations
- quantum_
optimization - Quantum-inspired optimization for data transformations Quantum-inspired optimization for data transformations
- reduction
- Dimensionality reduction algorithms Dimensionality reduction techniques
- scaling
- Advanced scaling and transformation methods Advanced scaling and transformation methods
- selection
- Feature selection utilities Feature selection utilities
- streaming
- Streaming transformations for continuous data Streaming transformations for continuous data processing
- text
- Text processing transformers Text processing transformers for feature extraction
- time_
series - Time series feature extraction Time series feature extraction
- utils
- Utility functions and helpers for data transformation Utility functions and helpers for data transformation