Expand description
A generic, thread-safe dataset container with lazy loading and caching.
dataset-core provides Dataset<T>, a lightweight wrapper that pairs a storage
directory with a lazily-initialized value of any type T. The actual downloading
and parsing logic is supplied by the caller through a loader closure, making
Dataset<T> suitable for any data source — local files, remote URLs, databases,
or in-memory generation.
On top of this core type, the crate offers an optional feature-gated module:
utils— helper functions for downloading files, extracting archives, verifying SHA-256 hashes, and managing temporary directories.
Ready-to-use loaders for classic ML datasets (Iris, Boston Housing, Diabetes,
Titanic, Wine Quality) live in the companion crate
dataset-ml, which depends on
dataset-core with the utils feature enabled and serves as the reference
implementation for wrapping Dataset<T>.
§Feature Flags
| Feature | What it enables |
|---|---|
utils | download_to, unzip, create_temp_dir, file_sha256_matches, acquire_dataset, and the error module |
With no features enabled, only Dataset<T> is available — depending only on
std::sync::OnceLock.
§Quick Start — Dataset<T>
use dataset_core::Dataset;
fn my_loader(dir: &str) -> Result<Vec<String>, std::io::Error> {
// In a real use case you would read/download files from `dir`.
Ok(vec!["hello".to_string(), "world".to_string()])
}
let ds: Dataset<Vec<String>> = Dataset::new("./my_data");
// First call runs the loader; subsequent calls return the cached reference.
let data = ds.load(my_loader).unwrap();
assert_eq!(data.len(), 2);
let data_again = ds.load(my_loader).unwrap();
assert!(std::ptr::eq(data, data_again)); // same reference, no reload§Utility Functions (feature utils)
download_to— download a remote file into a directoryunzip— extract a ZIP archivecreate_temp_dir— create a self-cleaning temporary directoryfile_sha256_matches— verify a file’s SHA-256 hashacquire_dataset— cache-aware dataset acquisition workflow (temp dir → prepare → optional hash check → move to final location)
Re-exports§
pub use error::DataFormatErrorKind;pub use error::DatasetError;pub use utils::acquire_dataset;pub use utils::create_temp_dir;pub use utils::download_to;pub use utils::file_sha256_matches;pub use utils::unzip;
Modules§
Structs§
- Dataset
- A generic, thread-safe dataset container with lazy loading and in-memory caching.