dataset_ml/lib.rs
1//! Built-in dataset implementations for machine learning.
2//!
3//! `dataset-ml` provides ready-to-use loaders for classic ML datasets built on top
4//! of [`dataset_core::Dataset`]. Each module is a worked example showing how to wrap
5//! `Dataset<T>` for a concrete data source: downloading from a URL, verifying a
6//! SHA-256 hash, parsing CSV records, and exposing typed accessors backed by
7//! [`ndarray`].
8//!
9//! # Datasets
10//!
11//! | Module | Samples | Features | Task Type |
12//! |-------------------------------------------------------|---------|----------|----------------|
13//! | [`iris`] | 150 | 4 | Classification |
14//! | [`boston_housing`] | 506 | 13 | Regression |
15//! | [`diabetes`] | 768 | 8 | Classification |
16//! | [`titanic`] | 891 | 11 | Classification |
17//! | [`wine_quality::red_wine_quality`] | 1,599 | 11 | Regression |
18//! | [`wine_quality::white_wine_quality`] | 4,898 | 11 | Regression |
19//!
20//! # Example
21//!
22//! ```no_run
23//! use dataset_ml::iris::Iris;
24//!
25//! let iris = Iris::new("./data");
26//! let (features, labels) = iris.data().unwrap();
27//! assert_eq!(features.shape(), &[150, 4]);
28//! ```
29//!
30//! All loaders are lazy: the first call downloads and parses the file, every
31//! subsequent call returns a cached reference. See the individual module docs
32//! for features, target, sample count, and source.
33
34/// Boston Housing dataset module.
35///
36/// Contains the Boston Housing dataset for predicting median house values
37/// in Boston suburbs based on various features like crime rate, room count,
38/// and accessibility to highways.
39pub mod boston_housing;
40
41/// Diabetes dataset module.
42///
43/// Contains the Pima Indians Diabetes dataset for binary classification
44/// based on 8 diagnostic measurements.
45pub mod diabetes;
46
47/// Iris flower dataset module.
48///
49/// Contains the classic Iris dataset for classifying iris flowers into
50/// three species (setosa, versicolor, virginica) based on sepal and petal
51/// measurements.
52pub mod iris;
53
54/// Titanic dataset module.
55///
56/// Contains data about Titanic passengers for predicting survival based
57/// on features like passenger class, sex, age, and fare.
58pub mod titanic;
59
60/// Wine Quality dataset module.
61///
62/// Contains wine quality assessment data for predicting quality scores
63/// based on physicochemical properties like acidity, sugar content, and
64/// alcohol percentage.
65pub mod wine_quality;
66
67pub use boston_housing::BostonHousing;
68pub use diabetes::Diabetes;
69pub use iris::Iris;
70pub use titanic::Titanic;
71pub use wine_quality::{red_wine_quality::RedWineQuality, white_wine_quality::WhiteWineQuality};