Skip to main content

Crate scirs2_transform

Crate scirs2_transform 

Source
Expand description

ยงSciRS2 Transform - Data Transformation and Preprocessing

scirs2-transform provides comprehensive data transformation utilities for machine learning, offering normalization, feature engineering, dimensionality reduction, encoding, imputation, and pipelines with SIMD acceleration and out-of-core processing for large datasets.

ยง๐ŸŽฏ Key Features

  • Normalization: Min-max, Z-score, robust scaling, quantile normalization
  • Feature Engineering: Polynomial features, interaction terms, binning
  • Dimensionality Reduction: PCA, SVD, t-SNE, UMAP, LDA
  • Encoding: One-hot, label, ordinal, target encoding
  • Imputation: Mean, median, mode, KNN, iterative imputation
  • Pipelines: Chained transformations with fit/transform API
  • Performance: SIMD operations, streaming, out-of-core processing

ยง๐Ÿ“ฆ Module Overview

SciRS2 Modulescikit-learn EquivalentDescription
normalizesklearn.preprocessing.StandardScalerData normalization/standardization
featuressklearn.preprocessing.PolynomialFeaturesFeature engineering
reductionsklearn.decomposition.PCADimensionality reduction
encodingsklearn.preprocessing.OneHotEncoderCategorical encoding
imputesklearn.impute.SimpleImputerMissing value imputation
pipelinesklearn.pipeline.PipelineTransformation pipelines

ยง๐Ÿš€ Quick Start

[dependencies]
scirs2-transform = "0.1.5"
use scirs2_transform::normalize::{normalize_array, NormalizationMethod};
use scirs2_core::ndarray::Array2;

// Standardize data (Z-score normalization)
let data = Array2::<f64>::zeros((100, 5));
let normalized = normalize_array(&data, NormalizationMethod::ZScore, 0).expect("should succeed");

ยง๐Ÿ”’ Version: 0.1.5 (January 15, 2026)

Re-exportsยง

pub use decomposition::DictionaryLearning;
pub use decomposition::NMF;
pub use encoding::BinaryEncoder;
pub use encoding::EncodedOutput;
pub use encoding::FrequencyEncoder;
pub use encoding::OneHotEncoder;
pub use encoding::OrdinalEncoder;
pub use encoding::SparseMatrix;
pub use encoding::TargetEncoder;
pub use encoding::WOEEncoder;
pub use error::Result;
pub use error::TransformError;
pub use features::binarize;
pub use features::discretize_equal_frequency;
pub use features::discretize_equal_width;
pub use features::log_transform;
pub use features::power_transform;
pub use features::PolynomialFeatures;
pub use features::PowerTransformer;
pub use impute::DistanceMetric;
pub use impute::ImputeStrategy;
pub use impute::IterativeImputer;
pub use impute::KNNImputer;
pub use impute::MissingIndicator;
pub use impute::SimpleImputer;
pub use impute::WeightingScheme;
pub use normalize::normalize_array;
pub use normalize::normalize_vector;
pub use normalize::NormalizationMethod;
pub use normalize::Normalizer;
pub use pipeline::make_column_transformer;
pub use pipeline::make_pipeline;
pub use pipeline::ColumnTransformer;
pub use pipeline::Pipeline;
pub use pipeline::RemainderOption;
pub use pipeline::Transformer;
pub use reduction::factor_analysis;
pub use reduction::factor_analysis;
pub use reduction::scree_plot_data;
pub use reduction::trustworthiness;
pub use reduction::AffinityMethod;
pub use reduction::DiffusionMaps;
pub use reduction::FactorAnalysis;
pub use reduction::FactorAnalysisResult;
pub use reduction::GraphMethod;
pub use reduction::Isomap;
pub use reduction::LaplacianEigenmaps;
pub use reduction::RotationMethod;
pub use reduction::ScreePlotData;
pub use reduction::SpectralEmbedding;
pub use reduction::TruncatedSVD;
pub use reduction::LDA;
pub use reduction::LLE;
pub use reduction::PCA;
pub use reduction::TSNE;
pub use reduction::UMAP;
pub use scaling::MaxAbsScaler;
pub use scaling::QuantileTransformer;
pub use selection::MutualInfoSelector;
pub use selection::RecursiveFeatureElimination;
pub use selection::VarianceThreshold;
pub use time_series::FourierFeatures;
pub use time_series::LagFeatures;
pub use time_series::TimeSeriesFeatures;
pub use time_series::WaveletFeatures;
pub use normalize_simd::simd_l2_normalize_1d;
pub use normalize_simd::simd_maxabs_normalize_1d;
pub use normalize_simd::simd_minmax_normalize_1d;
pub use normalize_simd::simd_normalize_adaptive;
pub use normalize_simd::simd_normalize_batch;
pub use normalize_simd::simd_normalizearray;
pub use normalize_simd::simd_zscore_normalize_1d;
pub use normalize_simd::AdaptiveBlockSizer;
pub use features_simd::simd_binarize;
pub use features_simd::simd_polynomial_features_optimized;
pub use features_simd::simd_power_transform;
pub use features_simd::SimdPolynomialFeatures;
pub use scaling_simd::SimdMaxAbsScaler;
pub use scaling_simd::SimdRobustScaler;
pub use scaling_simd::SimdStandardScaler;
pub use graph::adjacency_to_edge_list;
pub use graph::edge_list_to_adjacency;
pub use graph::ActivationType;
pub use graph::DeepWalk;
pub use graph::GraphAutoencoder;
pub use graph::LaplacianType;
pub use graph::Node2Vec;
pub use image::resize_images;
pub use image::rgb_to_grayscale;
pub use image::BlockNorm;
pub use image::HOGDescriptor;
pub use image::ImageNormMethod;
pub use image::ImageNormalizer;
pub use image::PatchExtractor;
pub use optimization_config::AdaptiveParameterTuner;
pub use optimization_config::AdvancedConfigOptimizer;
pub use optimization_config::AutoTuner;
pub use optimization_config::ConfigurationPredictor;
pub use optimization_config::DataCharacteristics;
pub use optimization_config::OptimizationConfig;
pub use optimization_config::OptimizationReport;
pub use optimization_config::PerformanceMetric;
pub use optimization_config::SystemMonitor;
pub use optimization_config::SystemResources;
pub use optimization_config::TransformationRecommendation;
pub use out_of_core::csv_chunks;
pub use out_of_core::ChunkedArrayReader;
pub use out_of_core::ChunkedArrayWriter;
pub use out_of_core::OutOfCoreConfig;
pub use out_of_core::OutOfCoreNormalizer;
pub use out_of_core::OutOfCoreTransformer;
pub use performance::EnhancedPCA;
pub use performance::EnhancedStandardScaler;
pub use streaming::OutlierMethod;
pub use streaming::StreamingFeatureSelector;
pub use streaming::StreamingMinMaxScaler;
pub use streaming::StreamingOutlierDetector;
pub use streaming::StreamingPCA;
pub use streaming::StreamingQuantileTracker;
pub use streaming::StreamingStandardScaler;
pub use streaming::StreamingTransformer;
pub use streaming::WindowedStreamingTransformer;
pub use text::CountVectorizer;
pub use text::HashingVectorizer;
pub use text::StreamingCountVectorizer;
pub use text::TfidfVectorizer;
pub use utils::ArrayMemoryPool;
pub use utils::DataChunker;
pub use utils::PerfUtils;
pub use utils::ProcessingStrategy;
pub use utils::StatUtils;
pub use utils::TypeConverter;
pub use utils::ValidationUtils;
pub use signal_transforms::cqt::CQTConfig;
pub use signal_transforms::cqt::Chromagram;
pub use signal_transforms::cqt::WindowFunction;
pub use signal_transforms::cqt::CQT;
pub use signal_transforms::cwt::ComplexMorletWavelet;
pub use signal_transforms::cwt::ContinuousWavelet;
pub use signal_transforms::cwt::GaussianWavelet;
pub use signal_transforms::cwt::MexicanHatWavelet;
pub use signal_transforms::cwt::MorletWavelet;
pub use signal_transforms::cwt::CWT;
pub use signal_transforms::dwt::BoundaryMode;
pub use signal_transforms::dwt::Dwt2dCoeffs;
pub use signal_transforms::dwt::WaveletFilters;
pub use signal_transforms::dwt::WaveletType;
pub use signal_transforms::dwt::DWT;
pub use signal_transforms::dwt::DWT2D;
pub use signal_transforms::dwt::DWTN;
pub use signal_transforms::mfcc::MFCCConfig;
pub use signal_transforms::mfcc::MelFilterbank;
pub use signal_transforms::mfcc::MFCC;
pub use signal_transforms::stft::PaddingMode;
pub use signal_transforms::stft::STFTConfig;
pub use signal_transforms::stft::Spectrogram;
pub use signal_transforms::stft::SpectrogramScaling;
pub use signal_transforms::stft::WindowType;
pub use signal_transforms::stft::STFT;
pub use signal_transforms::wpt::denoise_wpt;
pub use signal_transforms::wpt::BestBasisCriterion;
pub use signal_transforms::wpt::WaveletPacketNode;
pub use signal_transforms::wpt::WPT;
pub use gpu::GpuMatrixOps;
pub use gpu::GpuPCA;
pub use gpu::GpuTSNE;
pub use distributed::AutoScalingConfig;
pub use distributed::CircuitBreaker;
pub use distributed::ClusterHealthSummary;
pub use distributed::DistributedConfig;
pub use distributed::DistributedCoordinator;
pub use distributed::DistributedPCA;
pub use distributed::EnhancedDistributedCoordinator;
pub use distributed::NodeHealth;
pub use distributed::NodeInfo;
pub use distributed::NodeStatus;
pub use distributed::PartitioningStrategy;
pub use auto_feature_engineering::AdvancedMetaLearningSystem;
pub use auto_feature_engineering::AutoFeatureEngineer;
pub use auto_feature_engineering::DatasetMetaFeatures;
pub use auto_feature_engineering::EnhancedMetaFeatures;
pub use auto_feature_engineering::MultiObjectiveRecommendation;
pub use auto_feature_engineering::TransformationConfig;
pub use auto_feature_engineering::TransformationType;
pub use quantum_optimization::AdvancedQuantumMetrics;
pub use quantum_optimization::AdvancedQuantumOptimizer;
pub use quantum_optimization::AdvancedQuantumParams;
pub use quantum_optimization::QuantumHyperparameterTuner;
pub use quantum_optimization::QuantumInspiredOptimizer;
pub use quantum_optimization::QuantumParticle;
pub use quantum_optimization::QuantumTransformationOptimizer;
pub use neuromorphic_adaptation::AdvancedNeuromorphicMetrics;
pub use neuromorphic_adaptation::AdvancedNeuromorphicProcessor;
pub use neuromorphic_adaptation::NeuromorphicAdaptationNetwork;
pub use neuromorphic_adaptation::NeuromorphicMemorySystem;
pub use neuromorphic_adaptation::NeuromorphicTransformationSystem;
pub use neuromorphic_adaptation::SpikingNeuron;
pub use neuromorphic_adaptation::SystemState;
pub use neuromorphic_adaptation::TransformationEpisode;
pub use kernel::center_kernel_matrix;
pub use kernel::cross_gram_matrix;
pub use kernel::estimate_rbf_gamma;
pub use kernel::gram_matrix;
pub use kernel::is_positive_semidefinite;
pub use kernel::kernel_alignment;
pub use kernel::kernel_diagonal;
pub use kernel::kernel_eval;
pub use kernel::KernelPCA;
pub use kernel::KernelRidgeRegression;
pub use kernel::KernelType;
pub use monitoring::AlertConfig;
pub use monitoring::AlertType;
pub use monitoring::DriftDetectionResult;
pub use monitoring::DriftMethod;
pub use monitoring::PerformanceMetrics;
pub use monitoring::TransformationMonitor;

Modulesยง

auto_feature_engineering
Automated feature engineering with meta-learning Auto-generated module structure
decomposition
Matrix decomposition techniques Matrix decomposition techniques
distributed
Distributed processing for multi-node transformations Distributed processing for multi-node transformation pipelines
encoding
Categorical data encoding utilities Categorical data encoding utilities
error
Error handling for the transformation module Error types for the data transformation module
features
Feature engineering techniques Feature engineering utilities
features_simd
SIMD-accelerated feature engineering operations SIMD-accelerated feature engineering operations
gpu
GPU-accelerated transformations GPU-accelerated transformations
graph
Graph embedding transformers Graph embedding transformers for graph-based feature extraction
image
Image processing transformers Image processing transformers for feature extraction
impute
Missing value imputation utilities Missing value imputation utilities
kernel
Kernel methods (Kernel PCA, Kernel Ridge Regression, kernel functions) Kernel Methods
monitoring
Production monitoring with drift detection Production monitoring with drift detection and model degradation alerts
neuromorphic_adaptation
Neuromorphic computing integration for real-time adaptation Neuromorphic computing integration for real-time transformation adaptation
normalize
Basic normalization methods for data Data normalization and standardization utilities
normalize_simd
SIMD-accelerated normalization operations SIMD-accelerated normalization operations
optimization_config
Optimization configuration and auto-tuning system Optimization configuration and auto-tuning system
out_of_core
Out-of-core processing for large datasets Out-of-core processing for large datasets
performance
Performance optimizations and enhanced implementations Performance optimizations and enhanced implementations
pipeline
Pipeline API for chaining transformations Pipeline API for chaining transformations
quantum_optimization
Quantum-inspired optimization for data transformations Quantum-inspired optimization for data transformations
reduction
Dimensionality reduction algorithms Dimensionality reduction techniques
scaling
Advanced scaling and transformation methods Advanced scaling and transformation methods
scaling_simd
SIMD-accelerated scaling operations SIMD-accelerated scaling operations
selection
Feature selection utilities Feature selection utilities
signal_transforms
Signal transforms (DWT, CWT, WPT, STFT, MFCC, CQT, Chromagram) Signal Transform Module for v0.2.0
streaming
Streaming transformations for continuous data Streaming transformations for continuous data processing
text
Text processing transformers Text processing transformers for feature extraction
time_series
Time series feature extraction Time series feature extraction
utils
Utility functions and helpers for data transformation Utility functions and helpers for data transformation