Crate sklears

Crate sklears 

Source
Expand description

§sklears - Machine Learning in Rust

A comprehensive machine learning library inspired by scikit-learn’s intuitive API, combining it with Rust’s performance, safety guarantees, and fearless concurrency.

§Overview

sklears brings the familiar scikit-learn API to Rust with:

  • >99% scikit-learn API coverage validated for version 0.1.0-beta.1
  • 14-20x performance (validated) improvements over Python implementations (14-20x validated)
  • Memory safety without garbage collection overhead
  • Type-safe APIs that catch errors at compile time
  • Zero-copy operations for efficient data handling
  • Native parallelism with fearless concurrency via Rayon
  • GPU acceleration with optional CUDA and WebGPU backends

§Quick Start

use sklears::linear::LinearRegression;
use sklears::traits::{Fit, Predict};
use scirs2_core::ndarray::Array2;

// Create training data
let x_train = Array2::from_shape_vec((100, 5), (0..500).map(|i| i as f64).collect()).unwrap();
let y_train = Array2::from_shape_vec((100, 1), (0..100).map(|i| i as f64).collect()).unwrap();

// Train a linear regression model
let model = LinearRegression::new();
let trained_model = model.fit(&x_train, &y_train).unwrap();

// Make predictions
let predictions = trained_model.predict(&x_train).unwrap();

§Feature Flags

sklears uses feature flags to allow selective compilation of algorithm modules:

§Algorithm Modules

  • linear - Linear models (LinearRegression, Ridge, Lasso, LogisticRegression)
  • clustering - Clustering algorithms (KMeans, DBSCAN, etc.)
  • ensemble - Ensemble methods (RandomForest, GradientBoosting, AdaBoost)
  • svm - Support Vector Machines
  • tree - Decision trees
  • neural - Neural networks (MLP, autoencoders)
  • neighbors - K-Nearest Neighbors algorithms
  • decomposition - Dimensionality reduction (PCA, NMF, ICA)
  • naive-bayes - Naive Bayes classifiers
  • gaussian-process - Gaussian Process models

§Utilities

  • preprocessing - Data preprocessing and transformers
  • metrics - Evaluation metrics
  • model-selection - Cross-validation and hyperparameter search
  • datasets - Dataset generators and loaders
  • feature-selection - Feature selection algorithms
  • feature-extraction - Feature extraction methods

§Performance & Interop

  • parallel - Enable Rayon parallelism (enabled by default)
  • serde - Serialization support
  • simd - SIMD optimizations
  • gpu - GPU acceleration (CUDA/WebGPU)

§Architecture

sklears follows a three-layer architecture:

  1. Data Layer: Polars DataFrames for efficient data manipulation
  2. Computation Layer: NumRS2/ndarray arrays with BLAS/LAPACK backends
  3. Algorithm Layer: ML algorithms leveraging SciRS2’s scientific computing

§Type-Safe State Machines

Models use Rust’s type system to prevent common errors at compile time:

use sklears::linear::LinearRegression;
use sklears::traits::{Fit, Predict};
use scirs2_core::ndarray::{Array1, Array2};

let model = LinearRegression::new(); // Untrained state

// ❌ This won't compile - can't predict with untrained model:
// let predictions = model.predict(&x);

let x = Array2::zeros((10, 5));
let y = Array1::zeros(10);

// ✅ After fitting, model transitions to Trained state
let trained = model.fit(&x, &y).unwrap();
let predictions = trained.predict(&x).unwrap();

§Performance

Benchmarks show significant speedups over scikit-learn:

OperationDataset Sizescikit-learnsklearsSpeedup
Linear Regression1M × 1002.3s0.52s4.4x
K-Means100K × 505.1s0.48s10.6x
Random Forest50K × 2012.8s0.71s18.0x
StandardScaler1M × 1000.84s0.016s52.5x

§Integration with SciRS2

sklears is built on the SciRS2 ecosystem for scientific computing:

  • scirs2-core - Core array operations and random number generation
  • scirs2-linalg - Linear algebra (SVD, QR, eigenvalues, BLAS/LAPACK)
  • scirs2-optimize - Optimization algorithms (L-BFGS, gradient descent)
  • scirs2-stats - Statistical functions and distributions
  • scirs2-neural - Neural network primitives and autograd

§Examples

See the examples/ directory for comprehensive examples:

  • Basic linear regression
  • Classification pipelines
  • Cross-validation and hyperparameter tuning
  • Custom estimators
  • Neural network training

§Documentation

§Minimum Supported Rust Version (MSRV)

Rust 1.70 or later is required.

Re-exports§

pub use sklears_utils as utils;
pub use sklears_linear as linear;
pub use sklears_clustering as clustering;
pub use sklears_neighbors as neighbors;
pub use sklears_model_selection as model_selection;
pub use sklears_metrics as metrics;

Modules§

advanced_array_ops
advanced_benchmarking
Advanced Benchmarking Suite with Performance Regression Detection
algorithm_markers
api_analyzers
API Analysis Engines and Validation Components
api_data_structures
Core Data Structures for API Reference Generation
api_formatters
Output Formatters and Document Generators
api_generator_config
API Generator Configuration Module
async_traits
auto_benchmark_generation
Automatic Benchmark Generation System
autodiff
benchmarking
code_coverage
compatibility
compile_time_macros
Compile-Time Model Verification and Macro System
compile_time_validation
contribution
dataset
dependency_audit
dependent_types
Dependent Type Experiments for sklears-core
derive_macros
distributed
distributed_algorithms
Distributed Machine Learning Algorithms
dsl_impl
Domain-Specific Language (DSL) implementation for machine learning pipelines
effect_types
ensemble_improvements
error
exhaustive_error_handling
exotic_hardware
Auto-generated module structure
exotic_hardware_impls
Concrete Exotic Hardware Implementations
fallback_strategies
features
formal_verification
Formal Verification System for Machine Learning Algorithms
format_io
formatting
input_sanitization
interactive_api_reference
Interactive API Reference Generator
interactive_playground
Auto-generated module structure
macros
memory_safety
mock_objects
parallel
performance_profiling
Advanced Performance Profiling and Optimization Framework
performance_reporting
plugin
Plugin System Module
plugin_marketplace_impl
Concrete Plugin Marketplace Implementation
prelude
public
refinement_types
Refinement Types System for sklears-core
search_engines
streaming_lifetimes
trait_explorer
Trait Explorer Module
traits
tutorial_examples
Concrete Tutorial Examples and Learning Paths
tutorial_system
types
unsafe_audit
validation
validation_examples
wasm_playground_impl
WebAssembly Playground Implementation

Macros§

auto_benchmark
Macro to automatically generate benchmarks for a type
benchmark_suite
Creates comprehensive benchmarking suite for ML algorithms
cfg_feature
Macro for conditional compilation based on feature flags
cfg_impl
Macro for feature-gated function implementations
cfg_type
Macro for conditional type definitions based on features
define_algorithm_category
Macro for defining algorithm categories with compile-time checking
define_estimator
Advanced macro for creating ML estimators with builder pattern and validation
define_ml_algorithm
Creates a complete ML algorithm with all necessary boilerplate
define_ml_float_bounds
Helper macro for creating trait bound combinations commonly used in ML
destructure
Macro for easy destructuring of complex types
error_context
Macro for adding location context automatically
estimator_test_suite
Creates a test suite for an estimator implementation
impl_algorithm_markers
Macro for implementing multiple marker traits at once
impl_default_config
Helper macro for creating default trait implementations
impl_ml_traits
Implements standard machine learning traits for an estimator
io_effect
parameter_map
Creates a simple parameter mapping for algorithm configurations
pattern_guard
Macro for creating pattern guards with custom validation logic
pure_effect
Convenience macros for effect creation
quick_dataset
Advanced macro definitions for sklears-core
random_effect
refinement_predicate
Macro to create a custom refinement predicate
simd_operations
Creates SIMD-optimized operation implementations
validate
Convenience macro for validation
validate_performance
Macro for performance validation
validated_param
Macro for creating compile-time validated parameters
validation_rules
Macro for creating type-safe validation rules
verify_dimensions
Macro for dimension verification
verify_model
Macro for model verification
with_fallback
Convenience macro for executing operations with fallback