Crate sklears_impute

Crate sklears_impute 

Source
Expand description

Missing value imputation strategies

This module provides various strategies for handling missing values in datasets. It includes simple imputation methods as well as more sophisticated approaches like iterative imputation, KNN-based imputation, matrix factorization, and Bayesian methods.

Re-exports§

pub use advanced::analyze_breakdown_point;
pub use advanced::BreakdownPointAnalysis;
pub use advanced::CopulaImputer;
pub use advanced::CopulaParameters;
pub use advanced::EmpiricalCDF;
pub use advanced::EmpiricalQuantile;
pub use advanced::FactorAnalysisImputer;
pub use advanced::KDEImputer;
pub use advanced::LocalLinearImputer;
pub use advanced::LowessImputer;
pub use advanced::MultivariateNormalImputer;
pub use advanced::RobustRegressionImputer;
pub use advanced::TrimmedMeanImputer;
pub use bayesian::BayesianLinearImputer;
pub use bayesian::BayesianLogisticImputer;
pub use bayesian::BayesianModel;
pub use bayesian::BayesianModelAveraging;
pub use bayesian::BayesianModelAveragingResults;
pub use bayesian::BayesianMultipleImputer;
pub use bayesian::ConvergenceDiagnostics;
pub use bayesian::HierarchicalBayesianImputer;
pub use bayesian::HierarchicalBayesianSample;
pub use bayesian::PooledResults;
pub use bayesian::VariationalBayesImputer;
pub use benchmarks::AccuracyMetrics;
pub use benchmarks::BenchmarkDatasetGenerator;
pub use benchmarks::BenchmarkSuite;
pub use benchmarks::ImputationBenchmark;
pub use benchmarks::ImputationComparison;
pub use benchmarks::MissingPattern;
pub use benchmarks::MissingPatternGenerator;
pub use categorical::AssociationRule;
pub use categorical::AssociationRuleImputer;
pub use categorical::CategoricalClusteringImputer;
pub use categorical::CategoricalRandomForestImputer;
pub use categorical::HotDeckImputer;
pub use categorical::Item;
pub use categorical::Itemset;
pub use core::utils;
pub use core::ConvergenceInfo;
pub use core::ImputationError;
pub use core::ImputationMetadata;
pub use core::ImputationOutputWithMetadata;
pub use core::ImputationResult;
pub use core::Imputer;
pub use core::ImputerConfig;
pub use core::MissingPatternHandler;
pub use core::QualityAssessment;
pub use core::StatisticalValidator;
pub use core::TrainableImputer;
pub use core::TransformableImputer;
pub use dimensionality::CompressedSensingImputer;
pub use dimensionality::ICAImputer;
pub use dimensionality::ManifoldLearningImputer;
pub use dimensionality::PCAImputer;
pub use dimensionality::SparseImputer;
pub use domain_specific::CreditScoringImputer;
pub use domain_specific::DemographicDataImputer;
pub use domain_specific::EconomicIndicatorImputer;
pub use domain_specific::FinancialTimeSeriesImputer;
pub use domain_specific::GenomicImputer;
pub use domain_specific::LongitudinalStudyImputer;
pub use domain_specific::MetabolomicsImputer;
pub use domain_specific::MissingResponseHandler;
pub use domain_specific::PhylogeneticImputer;
pub use domain_specific::PortfolioDataImputer;
pub use domain_specific::ProteinExpressionImputer;
pub use domain_specific::RiskFactorImputer;
pub use domain_specific::SingleCellRNASeqImputer;
pub use domain_specific::SocialNetworkImputer;
pub use domain_specific::SurveyDataImputer;
pub use ensemble::ExtraTreesImputer;
pub use ensemble::GradientBoostingImputer;
pub use ensemble::RandomForestImputer;
pub use fluent_api::pluggable::ComposedPipeline;
pub use fluent_api::pluggable::DataCharacteristics;
pub use fluent_api::pluggable::DataType;
pub use fluent_api::pluggable::ImputationInstance;
pub use fluent_api::pluggable::ImputationMiddleware;
pub use fluent_api::pluggable::ImputationModule;
pub use fluent_api::pluggable::LogLevel;
pub use fluent_api::pluggable::LoggingMiddleware;
pub use fluent_api::pluggable::MissingPatternType;
pub use fluent_api::pluggable::ModuleConfig;
pub use fluent_api::pluggable::ModuleConfigSchema;
pub use fluent_api::pluggable::ModuleRegistry;
pub use fluent_api::pluggable::ParameterGroup;
pub use fluent_api::pluggable::ParameterRange;
pub use fluent_api::pluggable::ParameterSchema;
pub use fluent_api::pluggable::ParameterType;
pub use fluent_api::pluggable::PipelineComposer;
pub use fluent_api::pluggable::PipelineStage;
pub use fluent_api::pluggable::StageCondition;
pub use fluent_api::pluggable::ValidationMiddleware;
pub use fluent_api::quick;
pub use fluent_api::DeepLearningBuilder;
pub use fluent_api::EnsembleImputationBuilder;
pub use fluent_api::GaussianProcessBuilder;
pub use fluent_api::ImputationBuilder;
pub use fluent_api::ImputationMethod;
pub use fluent_api::ImputationPipeline;
pub use fluent_api::ImputationPreset;
pub use fluent_api::IterativeImputationBuilder;
pub use fluent_api::KNNImputationBuilder;
pub use fluent_api::PostprocessingConfig;
pub use fluent_api::PreprocessingConfig;
pub use fluent_api::SimpleImputationBuilder;
pub use fluent_api::ValidationConfig;
pub use independence::chi_square_independence_test;
pub use independence::cramers_v_association_test;
pub use independence::fisher_exact_independence_test;
pub use independence::kolmogorov_smirnov_independence_test;
pub use independence::pattern_sensitivity_analysis;
pub use independence::run_independence_test_suite;
pub use independence::sensitivity_analysis;
pub use independence::ChiSquareTestResult;
pub use independence::CramersVTestResult;
pub use independence::FisherExactTestResult;
pub use independence::IndependenceTestSuite;
pub use independence::KolmogorovSmirnovTestResult;
pub use independence::MARSensitivityCase;
pub use independence::MNARSensitivityCase;
pub use independence::MissingDataAssessment;
pub use independence::PatternSensitivityResult;
pub use independence::RobustnessSummary;
pub use independence::SensitivityAnalysisResult;
pub use information_theoretic::EntropyImputer;
pub use information_theoretic::InformationGainImputer;
pub use information_theoretic::MDLImputer;
pub use information_theoretic::MaxEntropyImputer;
pub use information_theoretic::MutualInformationImputer;
pub use kernel::GPPredictionResult;
pub use kernel::GaussianProcessImputer;
pub use kernel::KernelRidgeImputer;
pub use kernel::ReproducingKernelImputer;
pub use kernel::SVRImputer;
pub use memory_profiler::ImputationMemoryBenchmark;
pub use memory_profiler::MemoryProfiler;
pub use memory_profiler::MemoryProfilingResult;
pub use memory_profiler::MemoryStats;
pub use mixed_type::HeterogeneousImputer;
pub use mixed_type::MixedTypeMICEImputer;
pub use mixed_type::MixedTypeMultipleImputationResults;
pub use mixed_type::OrdinalImputer;
pub use mixed_type::VariableMetadata;
pub use mixed_type::VariableParameters;
pub use mixed_type::VariableType;
pub use multivariate::CanonicalCorrelationImputer;
pub use neural::AutoencoderImputer;
pub use neural::DiffusionImputer;
pub use neural::GANImputer;
pub use neural::MLPImputer;
pub use neural::NeuralODEImputer;
pub use neural::NormalizingFlowImputer;
pub use neural::VAEImputer;
pub use parallel::AdaptiveStreamingImputer;
pub use parallel::MemoryEfficientImputer;
pub use parallel::MemoryMappedData;
pub use parallel::MemoryOptimizedImputer;
pub use parallel::MemoryStrategy;
pub use parallel::OnlineStatistics;
pub use parallel::ParallelConfig;
pub use parallel::ParallelIterativeImputer;
pub use parallel::ParallelKNNImputer;
pub use parallel::SharedDataRef;
pub use parallel::SparseMatrix;
pub use parallel::StreamingImputer;
pub use simd_ops::SimdDistanceCalculator;
pub use simd_ops::SimdImputationOps;
pub use simd_ops::SimdKMeans;
pub use simd_ops::SimdMatrixOps;
pub use simd_ops::SimdStatistics;
pub use simple::MissingIndicator;
pub use simple::SimpleImputer;
pub use timeseries::ARIMAImputer;
pub use timeseries::KalmanFilterImputer;
pub use timeseries::SeasonalDecompositionImputer;
pub use timeseries::StateSpaceImputer;
pub use type_safe::ClassifiedArray;
pub use type_safe::Complete;
pub use type_safe::CompleteArray;
pub use type_safe::FixedSizeArray;
pub use type_safe::FixedSizeValidation;
pub use type_safe::ImputationQualityMetrics;
pub use type_safe::MARArray;
pub use type_safe::MCARArray;
pub use type_safe::MNARArray;
pub use type_safe::MissingMechanism;
pub use type_safe::MissingPatternValidator;
pub use type_safe::MissingValueDetector;
pub use type_safe::NaNDetector;
pub use type_safe::SentinelDetector;
pub use type_safe::TypeSafeImputation;
pub use type_safe::TypeSafeMeanImputer;
pub use type_safe::TypeSafeMissingOps;
pub use type_safe::TypedArray;
pub use type_safe::UnknownMechanism;
pub use type_safe::WithMissing;
pub use type_safe::MAR;
pub use type_safe::MCAR;
pub use type_safe::MNAR;
pub use visualization::create_completeness_matrix;
pub use visualization::create_missing_correlation_heatmap;
pub use visualization::create_missing_distribution_plot;
pub use visualization::create_missing_pattern_plot;
pub use visualization::export_correlation_csv;
pub use visualization::export_missing_pattern_csv;
pub use visualization::generate_missing_summary_stats;
pub use visualization::CompletenessMatrix;
pub use visualization::MissingCorrelationHeatmap;
pub use visualization::MissingDistributionPlot;
pub use visualization::MissingPatternPlot;
pub use approximate::ApproximateConfig;
pub use approximate::ApproximateKNNImputer;
pub use approximate::ApproximateSimpleImputer;
pub use approximate::ApproximationStrategy;
pub use approximate::LocalityHashTable;
pub use approximate::SketchingImputer;
pub use distributed::CommunicationStrategy;
pub use distributed::DistributedConfig;
pub use distributed::DistributedKNNImputer;
pub use distributed::DistributedSimpleImputer;
pub use distributed::DistributedWorker;
pub use distributed::ImputationCoordinator;
pub use out_of_core::IndexType;
pub use out_of_core::MemoryManager;
pub use out_of_core::NeighborIndex;
pub use out_of_core::OutOfCoreConfig;
pub use out_of_core::OutOfCoreKNNImputer;
pub use out_of_core::OutOfCoreSimpleImputer;
pub use out_of_core::PrefetchStrategy;
pub use sampling::AdaptiveSamplingImputer;
pub use sampling::ImportanceSamplingImputer;
pub use sampling::ParametricDistribution;
pub use sampling::ProposalDistribution;
pub use sampling::QuasiSequenceType;
pub use sampling::SampleDistribution;
pub use sampling::SamplingConfig;
pub use sampling::SamplingSimpleImputer;
pub use sampling::SamplingStrategy;
pub use sampling::StratifiedSamplingImputer;
pub use sampling::WeightFunction;

Modules§

advanced
Advanced imputation methods
approximate
Approximate imputation algorithms for fast processing
bayesian
Bayesian imputation methods
benchmarks
Benchmarking and comparison utilities for imputation methods
categorical
Categorical data imputation methods
core
Core types and traits for imputation operations
dimensionality
Dimensionality reduction-based imputation methods
distributed
Distributed imputation algorithms for large-scale missing data processing
domain_specific
Domain-specific imputation methods
ensemble
Ensemble-based imputation methods
fluent_api
Fluent API and builder patterns for easy imputation configuration
independence
Independence tests for missing data mechanisms
information_theoretic
Information-theoretic imputation methods
kernel
Kernel-based imputation methods
memory_profiler
Memory profiling and monitoring for imputation operations
mixed_type
Mixed-type data imputation methods
multivariate
Multivariate imputation methods
neural
Neural network-based imputation methods
out_of_core
Out-of-core imputation algorithms for datasets larger than memory
parallel
Parallel imputation algorithms for high-performance missing data processing
sampling
Sampling-based imputation methods
simd_ops
Optimized numerical operations for high-performance imputation
simple
Simple imputation methods
timeseries
Time series imputation methods
type_safe
Type-safe missing data operations with phantom types for compile-time validation
visualization
Missing data visualization utilities

Macros§

profile_memory
Convenience macro for profiling memory usage

Structs§

KNNImputer
K-Nearest Neighbors Imputer
KNNImputerTrained
Trained state for KNNImputer

Functions§

analyze_missing_patterns
Analysis functions for missing data patterns
missing_completeness_matrix
Compute missing completeness matrix
missing_correlation_matrix
Compute missing correlation matrix
missing_data_summary
Generate comprehensive missing data summary