sklears-datasets
Latest release:
0.1.0-beta.1(January 1, 2026). See the workspace release notes for highlights and upgrade guidance.
Overview
sklears-datasets centralizes dataset loaders, synthetic generators, and data utilities used throughout the sklears ecosystem. It mirrors scikit-learn’s dataset module while adding Rust-first performance and IO enhancements.
Key Features
- Classic Loaders: Diabetes, Iris, Digits, Wine, Breast Cancer, 20 Newsgroups, and more.
- Synthetic Generators:
make_blobs,make_moons,make_circles, Gaussian quantiles, regression surfaces, and streaming generators. - File IO: CSV, Parquet, Arrow IPC, and memory-mapped dataset support with Polars integration.
- Benchmark Utilities: Deterministic dataset splits and sampling strategies for reproducible experiments.
Quick Start
use ;
// Built-in dataset
let iris = load_iris?;
println!;
// Synthetic data
let blobs = make_blobs
.n_features
.centers
.cluster_std
.random_state
.build?;
Status
- All loaders/generators validated through the 11,292 passing workspace tests for
0.1.0-beta.1. - Supports lazy loading and streaming for large-scale workflows.
- Future work (federated dataset shards, synthetic time series) tracked in this crate’s
TODO.md.