pub struct Dataset<T> { /* private fields */ }Expand description
A generic, thread-safe dataset container with lazy loading and in-memory caching.
Dataset<T> is a thin caching wrapper that holds a storage_dir (the directory
where dataset files are stored on disk) and a lazily-initialized value of type T.
The actual downloading and parsing logic is provided by the caller through a loader
closure passed to Dataset::load.
This struct is designed to be the building block for both the built-in datasets shipped with this crate and any custom datasets defined by external users.
§Type Parameter
T- The type of the parsed dataset. Can be any type, such as(Array2<f64>, Array1<f64>), a custom struct, or any other data representation.Tmust implementSend + SyncforDataset<T>to be shared across threads.
§Thread Safety
Dataset<T> is Send + Sync when T is Send + Sync. The internal OnceLock
ensures that the loader closure runs at most once, even when multiple threads call
Dataset::load concurrently.
§Example
use dataset_core::Dataset;
// Define a simple loader that reads a value from the storage directory path.
// The loader can return any error type you choose.
fn my_loader(dir: &str) -> Result<Vec<String>, std::io::Error> {
// In a real use case, you would download/read files from `dir`.
// Here we just demonstrate the caching behavior.
Ok(vec!["hello".to_string(), "world".to_string()])
}
let dataset: Dataset<Vec<String>> = Dataset::new("./my_data");
// The first call to `load` triggers the loader
let data = dataset.load(my_loader).unwrap();
assert_eq!(data.len(), 2);
// Subsequent calls return the cached reference instantly
let data_again = dataset.load(my_loader).unwrap();
assert!(std::ptr::eq(data, data_again)); // same reference, no re-load
// Check whether data has been loaded
assert!(dataset.is_loaded());Implementations§
Source§impl<T> Dataset<T>
impl<T> Dataset<T>
Sourcepub fn new(storage_dir: &str) -> Self
pub fn new(storage_dir: &str) -> Self
Create a new Dataset instance without loading any data.
This is a lightweight operation that only stores the storage directory path.
No I/O or network requests are performed until Dataset::load is called.
§Parameters
storage_dir- Directory where dataset files will be stored. The directory will be created automatically when the loader runs if it does not exist.
§Returns
A new Dataset<T> instance ready for lazy loading.
Sourcepub fn load<E>(
&self,
loader: impl FnOnce(&str) -> Result<T, E>,
) -> Result<&T, E>
pub fn load<E>( &self, loader: impl FnOnce(&str) -> Result<T, E>, ) -> Result<&T, E>
Load the dataset, executing the loader on first call and caching the result.
On the first call, the loader closure is invoked with the storage directory
path. The returned value is cached internally. All subsequent calls — from any
thread — return a reference to the cached value without running the loader again.
§Parameters
loader- A closure or function that takes the storage directory path (&str) and returnsResult<T, E>. This is where you perform downloading, file I/O, and parsing. The loader is only called once; if the data is already cached, it is ignored.
§Returns
Ok(&T)- A reference to the cached dataset.
§Errors
Returns any error produced by the loader closure on first invocation.
Once data is successfully loaded and cached, this method never returns an error.
Sourcepub fn is_loaded(&self) -> bool
pub fn is_loaded(&self) -> bool
Check whether the dataset has been loaded into memory.
§Returns
true if Dataset::load has been called successfully at least once,
false otherwise.