# sklears-datasets
[](https://crates.io/crates/sklears-datasets)
[](https://docs.rs/sklears-datasets)
[](../../LICENSE)
[](https://www.rust-lang.org)
> **Latest release:** `0.1.0-beta.1` (January 1, 2026). See the [workspace release notes](../../docs/releases/0.1.0-beta.1.md) for highlights and upgrade guidance.
## Overview
`sklears-datasets` centralizes dataset loaders, synthetic generators, and data utilities used throughout the sklears ecosystem. It mirrors scikit-learn’s dataset module while adding Rust-first performance and IO enhancements.
## Key Features
- **Classic Loaders**: Diabetes, Iris, Digits, Wine, Breast Cancer, 20 Newsgroups, and more.
- **Synthetic Generators**: `make_blobs`, `make_moons`, `make_circles`, Gaussian quantiles, regression surfaces, and streaming generators.
- **File IO**: CSV, Parquet, Arrow IPC, and memory-mapped dataset support with Polars integration.
- **Benchmark Utilities**: Deterministic dataset splits and sampling strategies for reproducible experiments.
## Quick Start
```rust
use sklears_datasets::{load_iris, make_blobs};
// Built-in dataset
let iris = load_iris()?;
println!("{} samples, {} features", iris.data.nrows(), iris.data.ncols());
// Synthetic data
let blobs = make_blobs(1000)
.n_features(10)
.centers(4)
.cluster_std(2.5)
.random_state(Some(42))
.build()?;
```
## Status
- All loaders/generators validated through the 11,292 passing workspace tests for `0.1.0-beta.1`.
- Supports lazy loading and streaming for large-scale workflows.
- Future work (federated dataset shards, synthetic time series) tracked in this crate’s `TODO.md`.