Expand description
ยงSciRS2 Transform - Data Transformation and Preprocessing
scirs2-transform provides comprehensive data transformation utilities for machine learning, offering normalization, feature engineering, dimensionality reduction, encoding, imputation, and pipelines with SIMD acceleration and out-of-core processing for large datasets.
ยง๐ฏ Key Features
- Normalization: Min-max, Z-score, robust scaling, quantile normalization
- Feature Engineering: Polynomial features, interaction terms, binning
- Dimensionality Reduction: PCA, SVD, t-SNE, UMAP, LDA
- Encoding: One-hot, label, ordinal, target encoding
- Imputation: Mean, median, mode, KNN, iterative imputation
- Pipelines: Chained transformations with fit/transform API
- Performance: SIMD operations, streaming, out-of-core processing
ยง๐ฆ Module Overview
SciRS2 Module | scikit-learn Equivalent | Description |
---|---|---|
normalize | sklearn.preprocessing.StandardScaler | Data normalization/standardization |
features | sklearn.preprocessing.PolynomialFeatures | Feature engineering |
reduction | sklearn.decomposition.PCA | Dimensionality reduction |
encoding | sklearn.preprocessing.OneHotEncoder | Categorical encoding |
impute | sklearn.impute.SimpleImputer | Missing value imputation |
pipeline | sklearn.pipeline.Pipeline | Transformation pipelines |
ยง๐ Quick Start
[dependencies]
scirs2-transform = "0.1.0-rc.1"
use scirs2_transform::normalize::{normalize_array, NormalizationMethod};
use scirs2_core::ndarray::Array2;
// Standardize data (Z-score normalization)
let data = Array2::<f64>::zeros((100, 5));
let normalized = normalize_array(&data, NormalizationMethod::ZScore, 0).unwrap();
ยง๐ Version: 0.1.0-rc.1 (October 03, 2025)
Re-exportsยง
pub use decomposition::DictionaryLearning;
pub use decomposition::NMF;
pub use encoding::BinaryEncoder;
pub use encoding::EncodedOutput;
pub use encoding::FrequencyEncoder;
pub use encoding::OneHotEncoder;
pub use encoding::OrdinalEncoder;
pub use encoding::SparseMatrix;
pub use encoding::TargetEncoder;
pub use encoding::WOEEncoder;
pub use error::Result;
pub use error::TransformError;
pub use features::binarize;
pub use features::discretize_equal_frequency;
pub use features::discretize_equal_width;
pub use features::log_transform;
pub use features::power_transform;
pub use features::PolynomialFeatures;
pub use features::PowerTransformer;
pub use impute::DistanceMetric;
pub use impute::ImputeStrategy;
pub use impute::IterativeImputer;
pub use impute::KNNImputer;
pub use impute::MissingIndicator;
pub use impute::SimpleImputer;
pub use impute::WeightingScheme;
pub use normalize::normalize_array;
pub use normalize::normalize_vector;
pub use normalize::NormalizationMethod;
pub use normalize::Normalizer;
pub use pipeline::make_column_transformer;
pub use pipeline::make_pipeline;
pub use pipeline::ColumnTransformer;
pub use pipeline::Pipeline;
pub use pipeline::RemainderOption;
pub use pipeline::Transformer;
pub use reduction::trustworthiness;
pub use reduction::AffinityMethod;
pub use reduction::Isomap;
pub use reduction::SpectralEmbedding;
pub use reduction::TruncatedSVD;
pub use reduction::LDA;
pub use reduction::LLE;
pub use reduction::PCA;
pub use reduction::TSNE;
pub use reduction::UMAP;
pub use scaling::MaxAbsScaler;
pub use scaling::QuantileTransformer;
pub use selection::MutualInfoSelector;
pub use selection::RecursiveFeatureElimination;
pub use selection::VarianceThreshold;
pub use time_series::FourierFeatures;
pub use time_series::LagFeatures;
pub use time_series::TimeSeriesFeatures;
pub use time_series::WaveletFeatures;
pub use graph::adjacency_to_edge_list;
pub use graph::edge_list_to_adjacency;
pub use graph::ActivationType;
pub use graph::DeepWalk;
pub use graph::GraphAutoencoder;
pub use graph::LaplacianType;
pub use graph::Node2Vec;
pub use image::resize_images;
pub use image::rgb_to_grayscale;
pub use image::BlockNorm;
pub use image::HOGDescriptor;
pub use image::ImageNormMethod;
pub use image::ImageNormalizer;
pub use image::PatchExtractor;
pub use optimization_config::AdaptiveParameterTuner;
pub use optimization_config::AdvancedConfigOptimizer;
pub use optimization_config::AutoTuner;
pub use optimization_config::ConfigurationPredictor;
pub use optimization_config::DataCharacteristics;
pub use optimization_config::OptimizationConfig;
pub use optimization_config::OptimizationReport;
pub use optimization_config::PerformanceMetric;
pub use optimization_config::SystemMonitor;
pub use optimization_config::SystemResources;
pub use optimization_config::TransformationRecommendation;
pub use out_of_core::csv_chunks;
pub use out_of_core::ChunkedArrayReader;
pub use out_of_core::ChunkedArrayWriter;
pub use out_of_core::OutOfCoreConfig;
pub use out_of_core::OutOfCoreNormalizer;
pub use out_of_core::OutOfCoreTransformer;
pub use performance::EnhancedPCA;
pub use performance::EnhancedStandardScaler;
pub use streaming::OutlierMethod;
pub use streaming::StreamingFeatureSelector;
pub use streaming::StreamingMinMaxScaler;
pub use streaming::StreamingOutlierDetector;
pub use streaming::StreamingPCA;
pub use streaming::StreamingQuantileTracker;
pub use streaming::StreamingStandardScaler;
pub use streaming::StreamingTransformer;
pub use streaming::WindowedStreamingTransformer;
pub use text::CountVectorizer;
pub use text::HashingVectorizer;
pub use text::StreamingCountVectorizer;
pub use text::TfidfVectorizer;
pub use utils::ArrayMemoryPool;
pub use utils::DataChunker;
pub use utils::PerfUtils;
pub use utils::ProcessingStrategy;
pub use utils::StatUtils;
pub use utils::TypeConverter;
pub use utils::ValidationUtils;
pub use auto_feature_engineering::AdvancedMetaLearningSystem;
pub use auto_feature_engineering::AutoFeatureEngineer;
pub use auto_feature_engineering::DatasetMetaFeatures;
pub use auto_feature_engineering::EnhancedMetaFeatures;
pub use auto_feature_engineering::MultiObjectiveRecommendation;
pub use auto_feature_engineering::TransformationConfig;
pub use auto_feature_engineering::TransformationType;
pub use quantum_optimization::AdvancedQuantumMetrics;
pub use quantum_optimization::AdvancedQuantumOptimizer;
pub use quantum_optimization::AdvancedQuantumParams;
pub use quantum_optimization::QuantumHyperparameterTuner;
pub use quantum_optimization::QuantumInspiredOptimizer;
pub use quantum_optimization::QuantumParticle;
pub use quantum_optimization::QuantumTransformationOptimizer;
pub use neuromorphic_adaptation::AdvancedNeuromorphicMetrics;
pub use neuromorphic_adaptation::AdvancedNeuromorphicProcessor;
pub use neuromorphic_adaptation::NeuromorphicAdaptationNetwork;
pub use neuromorphic_adaptation::NeuromorphicMemorySystem;
pub use neuromorphic_adaptation::NeuromorphicTransformationSystem;
pub use neuromorphic_adaptation::SpikingNeuron;
pub use neuromorphic_adaptation::SystemState;
pub use neuromorphic_adaptation::TransformationEpisode;
Modulesยง
- auto_
feature_ engineering - Automated feature engineering with meta-learning Automated feature engineering with meta-learning
- decomposition
- Matrix decomposition techniques Matrix decomposition techniques
- encoding
- Categorical data encoding utilities Categorical data encoding utilities
- error
- Error handling for the transformation module Error types for the data transformation module
- features
- Feature engineering techniques Feature engineering utilities
- graph
- Graph embedding transformers Graph embedding transformers for graph-based feature extraction
- image
- Image processing transformers Image processing transformers for feature extraction
- impute
- Missing value imputation utilities Missing value imputation utilities
- neuromorphic_
adaptation - Neuromorphic computing integration for real-time adaptation Neuromorphic computing integration for real-time transformation adaptation
- normalize
- Basic normalization methods for data Data normalization and standardization utilities
- optimization_
config - Optimization configuration and auto-tuning system Optimization configuration and auto-tuning system
- out_
of_ core - Out-of-core processing for large datasets Out-of-core processing for large datasets
- performance
- Performance optimizations and enhanced implementations Performance optimizations and enhanced implementations
- pipeline
- Pipeline API for chaining transformations Pipeline API for chaining transformations
- quantum_
optimization - Quantum-inspired optimization for data transformations Quantum-inspired optimization for data transformations
- reduction
- Dimensionality reduction algorithms Dimensionality reduction techniques
- scaling
- Advanced scaling and transformation methods Advanced scaling and transformation methods
- selection
- Feature selection utilities Feature selection utilities
- streaming
- Streaming transformations for continuous data Streaming transformations for continuous data processing
- text
- Text processing transformers Text processing transformers for feature extraction
- time_
series - Time series feature extraction Time series feature extraction
- utils
- Utility functions and helpers for data transformation Utility functions and helpers for data transformation