sklears-core 0.1.0-alpha.2

Core traits, types, and utilities for sklears machine learning library
Documentation

sklears-core

Crates.io Documentation License Minimum Rust Version

The foundational crate for sklears, providing core traits, types, and utilities that power the entire machine learning ecosystem. Production-ready with 100% test coverage.

Latest release: 0.1.0-alpha.2 (December 22, 2025). See the workspace release notes for highlights and upgrade guidance.

Overview

sklears-core provides the fundamental building blocks for all sklears algorithms:

  • Core Traits: Comprehensive ML abstractions with type-safe state management
  • Advanced Type System: Compile-time validation, phantom types, const generics
  • Performance Infrastructure: SIMD, GPU support, memory pooling, parallel processing
  • Error Handling: Rich error types with context propagation and recovery
  • Integration: scikit-learn compatibility, format I/O, cross-framework support

Status

  • Implementation: 0.1.0-alpha.2 ships with >99% of the planned v0.1 APIs implemented.
  • Validation: Covered by the 11,292 passing workspace tests (69 skipped) executed on December 22, 2025.
  • Performance: Achieves 3-100x improvements as designed via SIMD, threading, and cache-friendly layouts.
  • API Stability: Breaking changes still possible before beta; stabilization roadmap tracked in the root TODO.md.

Core Trait System

Base Traits

Estimator<State>

The foundational trait for all ML models with compile-time state tracking:

pub trait Estimator<State = Untrained> {
    type Config;
    type Error: std::error::Error;
}

Learning Traits

// Supervised learning
pub trait Fit<X, Y, State = Untrained> {
    type Fitted;
    fn fit(self, x: &X, y: &Y) -> Result<Self::Fitted>;
}

// Incremental/online learning
pub trait PartialFit<X, Y> {
    fn partial_fit(&mut self, x: &X, y: &Y) -> Result<()>;
}

// Unsupervised learning
pub trait FitTransform<X, Y = (), Output = X> {
    fn fit_transform(self, x: &X, y: Option<&Y>) -> Result<Output>;
}

Prediction Traits

// Standard predictions
pub trait Predict<X, Output> {
    fn predict(&self, x: &X) -> Result<Output>;
}

// Probabilistic predictions
pub trait PredictProba<X, Output> {
    fn predict_proba(&self, x: &X) -> Result<Output>;
}

// Decision scores
pub trait DecisionFunction<X, Output> {
    fn decision_function(&self, x: &X) -> Result<Output>;
}

Advanced Features

Async Trait Support

pub trait AsyncFit<X, Y> {
    async fn fit_async(self, x: &X, y: &Y) -> Result<Self::Fitted>;
}

pub trait AsyncPredict<X, Output> {
    async fn predict_async(&self, x: &X) -> Result<Output>;
}

GPU Acceleration

use sklears_core::gpu::GpuContext;

pub trait GpuAccelerated {
    fn to_gpu(self, ctx: &GpuContext) -> Result<Self::GpuVersion>;
}

Type-Safe State Management

Prevent common ML errors at compile time:

use sklears_core::{Untrained, Trained};

// Model starts untrained
struct Model<State = Untrained> {
    config: Config,
    state: PhantomData<State>,
    weights_: Option<Weights>,
}

// Only untrained models can be fitted
impl Fit<X, Y> for Model<Untrained> {
    type Fitted = Model<Trained>;
    
    fn fit(self, x: &X, y: &Y) -> Result<Self::Fitted> {
        // Training logic...
        Ok(Model {
            config: self.config,
            state: PhantomData,
            weights_: Some(trained_weights),
        })
    }
}

// Only trained models can predict
impl Predict<X, Y> for Model<Trained> {
    fn predict(&self, x: &X) -> Result<Y> {
        let weights = self.weights_.as_ref().unwrap(); // Safe!
        // Prediction logic...
    }
}

This prevents:

  • Calling predict() on untrained models
  • Accessing parameters before fitting
  • Double-fitting models
  • All caught at compile time!

Advanced Type System

Compile-Time Validation

use sklears_core::validation::{ValidatedConfig, PositiveValidator};

#[derive(ValidatedConfig)]
struct HyperParams {
    #[validate(PositiveValidator)]
    learning_rate: f64,
    
    #[validate(RangeValidator { min: 0.0, max: 1.0 })]
    dropout: f64,
}

Phantom Types for Safety

use sklears_core::phantom::{Classification, Regression};

struct Metrics<T> {
    _task: PhantomData<T>,
}

// Type-safe metric selection
impl Metrics<Classification> {
    fn accuracy(&self) -> f64 { ... }
}

impl Metrics<Regression> {
    fn mse(&self) -> f64 { ... }
}

Performance Features

SIMD Optimizations

use sklears_core::simd::SimdOps;

// Automatic SIMD acceleration
let distances = SimdOps::euclidean_distance_matrix(&points);

Memory Efficiency

use sklears_core::memory::{MemoryPool, CacheOptimized};

// Memory pooling for allocations
let pool = MemoryPool::new(1_000_000);
let array = pool.allocate_array::<f64>(1000)?;

// Cache-friendly operations
let accumulator = CacheOptimizedAccumulator::new();

Error Handling

Rich error types with context:

use sklears_core::{Result, SklearsError, validate};

fn train_model(x: &Array2<f64>, y: &Array1<f64>) -> Result<Model> {
    // Comprehensive validation
    validate::check_consistent_length(x, y)?;
    validate::check_finite(learning_rate, "learning_rate")?;
    validate::check_no_missing(x)?;
    
    // Error context propagation
    let model = complex_training(x, y)
        .context("Failed during gradient computation")?;
    
    Ok(model)
}

Macro System

Powerful macros for boilerplate reduction:

// Quick dataset creation
let dataset = quick_dataset! {
    features: [[1.0, 2.0], [3.0, 4.0]],
    target: [0, 1],
    feature_names: ["x1", "x2"]
};

// ML-specific bounds
define_ml_float_bounds!(MLFloat: Float + NumCast + Sum);

// Automatic test generation
estimator_test_suite!(MyEstimator, {
    test_fit_predict: (X, y),
    test_persistence: true,
    test_clone: true,
});

Integration & Compatibility

scikit-learn API Compatibility

use sklears_core::sklearn_compat::SklearnEstimator;

// Drop-in replacement for sklearn models
let model = SklearnEstimator::from_sklearn(sklearn_model)?;

Cross-Framework Support

// NumPy arrays
let np_array = array.to_numpy()?;

// PyTorch tensors
let tensor = array.to_torch_tensor()?;

// Polars DataFrames
let df = Dataset::from_polars(dataframe)?;

Format I/O

Comprehensive format support:

  • CSV, JSON, Parquet
  • HDF5, NPY/NPZ
  • Arrow, Feather
  • ONNX, PMML, MLflow

Builder Pattern

Consistent API across all estimators:

let model = LinearRegression::builder()
    .learning_rate(0.01)
    .max_iter(1000)
    .early_stopping(true)
    .validation_fraction(0.2)
    .n_jobs(4)
    .random_state(42)
    .build()?;

Testing Infrastructure

Property-Based Testing

use sklears_core::testing::properties;

proptest! {
    #[test]
    fn test_model_properties(
        x in array_strategy(),
        y in target_strategy()
    ) {
        properties::assert_fit_deterministic(&model, &x, &y);
        properties::assert_predict_shape(&model, &x, &y);
    }
}

Mock Objects

use sklears_core::testing::MockEstimator;

let mock = MockEstimator::new()
    .expect_fit()
    .returning(|x, y| Ok(trained_model));

Contributing

We welcome contributions! See CONTRIBUTING.md.

License

Licensed under either of:

  • Apache License, Version 2.0
  • MIT license

Citation

@software{sklears_core,
  title = {sklears-core: Type-Safe ML Foundation for Rust},
  author = {Cool Japan Team},
  year = {2025},
  url = {https://github.com/cool-japan/sklears}
}