Expand description
A generic, thread-safe dataset container with lazy loading and caching.
dataset-core provides Dataset<T>, a lightweight wrapper that pairs a storage
directory with a lazily-initialized value of any type T. The actual downloading
and parsing logic is supplied by the caller through a loader closure, making
Dataset<T> suitable for any data source — local files, remote URLs, databases,
or in-memory generation.
On top of this core type, the crate offers optional feature-gated modules:
utils— helper functions for downloading files, extracting archives, verifying SHA-256 hashes, and managing temporary directories.datasets— ready-to-use loaders for classic ML datasets (Iris, Boston Housing, Diabetes, Titanic, Wine Quality). These also serve as reference implementations showing how to wrapDataset<T>for a concrete use case.
§Feature Flags
| Feature | What it enables |
|---|---|
utils | download_to, unzip, create_temp_dir, file_sha256_matches, acquire_dataset, and the error module |
datasets | All built-in dataset loaders (implies utils) |
With no features enabled, only Dataset<T> is available — only depend on std::sync::OnceLock.
§Quick Start — Dataset<T>
use dataset_core::Dataset;
fn my_loader(dir: &str) -> Result<Vec<String>, std::io::Error> {
// In a real use case you would read/download files from `dir`.
Ok(vec!["hello".to_string(), "world".to_string()])
}
let ds: Dataset<Vec<String>> = Dataset::new("./my_data");
// First call runs the loader; subsequent calls return the cached reference.
let data = ds.load(my_loader).unwrap();
assert_eq!(data.len(), 2);
let data_again = ds.load(my_loader).unwrap();
assert!(std::ptr::eq(data, data_again)); // same reference, no reload§Built-in Datasets (feature datasets)
| Dataset | Samples | Features | Task Type |
|---|---|---|---|
| Iris | 150 | 4 | Classification |
| Boston Housing | 506 | 13 | Regression |
| Diabetes | 768 | 8 | Classification |
| Titanic | 891 | 11 | Classification |
| Wine Quality (Red) | 1,599 | 11 | Regression |
| Wine Quality (White) | 4,898 | 11 | Regression |
ⓘ
use dataset_core::datasets::iris::Iris;
let iris = Iris::new("./data");
let (features, labels) = iris.data().unwrap();
assert_eq!(features.shape(), &[150, 4]);§Utility Functions (feature utils)
download_to— download a remote file into a directoryunzip— extract a ZIP archivecreate_temp_dir— create a self-cleaning temporary directoryfile_sha256_matches— verify a file’s SHA-256 hashacquire_dataset— cache-aware dataset acquisition workflow (temp dir → prepare → optional hash check → move to final location)
Re-exports§
pub use error::DataFormatErrorKind;pub use error::DatasetError;pub use utils::acquire_dataset;pub use utils::create_temp_dir;pub use utils::download_to;pub use utils::file_sha256_matches;pub use utils::unzip;
Modules§
- datasets
- Built-in dataset implementations.
- error
- Error handling module.
- utils
- Utility functions for dataset authors.
Structs§
- Dataset
- A generic, thread-safe dataset container with lazy loading and in-memory caching.