Skip to main content

Dataset

Struct Dataset 

Source
pub struct Dataset<T> { /* private fields */ }
Expand description

A generic, thread-safe dataset container with lazy loading and in-memory caching.

Dataset<T> is a thin caching wrapper that holds a storage_dir (the directory where dataset files are stored on disk) and a lazily-initialized value of type T. The actual downloading and parsing logic is provided by the caller through a loader closure passed to Dataset::load.

This struct is designed to be the building block for both the built-in datasets shipped with this crate and any custom datasets defined by external users.

§Type Parameter

  • T - The type of the parsed dataset. Can be any type, such as (Array2<f64>, Array1<f64>), a custom struct, or any other data representation. T must implement Send + Sync for Dataset<T> to be shared across threads.

§Thread Safety

Dataset<T> is Send + Sync when T is Send + Sync. The internal OnceLock ensures that the loader closure runs at most once, even when multiple threads call Dataset::load concurrently.

§Example

use dataset_core::Dataset;

// Define a simple loader that reads a value from the storage directory path.
// The loader can return any error type you choose.
fn my_loader(dir: &str) -> Result<Vec<String>, std::io::Error> {
    // In a real use case, you would download/read files from `dir`.
    // Here we just demonstrate the caching behavior.
    Ok(vec!["hello".to_string(), "world".to_string()])
}

let dataset: Dataset<Vec<String>> = Dataset::new("./my_data");

// The first call to `load` triggers the loader
let data = dataset.load(my_loader).unwrap();
assert_eq!(data.len(), 2);

// Subsequent calls return the cached reference instantly
let data_again = dataset.load(my_loader).unwrap();
assert!(std::ptr::eq(data, data_again)); // same reference, no re-load

// Check whether data has been loaded
assert!(dataset.is_loaded());

Implementations§

Source§

impl<T> Dataset<T>

Source

pub fn new(storage_dir: &str) -> Self

Create a new Dataset instance without loading any data.

This is a lightweight operation that only stores the storage directory path. No I/O or network requests are performed until Dataset::load is called.

§Parameters
  • storage_dir - Directory where dataset files will be stored. The directory will be created automatically when the loader runs if it does not exist.
§Returns

A new Dataset<T> instance ready for lazy loading.

Source

pub fn load<E>( &self, loader: impl FnOnce(&str) -> Result<T, E>, ) -> Result<&T, E>

Load the dataset, executing the loader on first call and caching the result.

On the first call, the loader closure is invoked with the storage directory path. The returned value is cached internally. All subsequent calls — from any thread — return a reference to the cached value without running the loader again.

§Parameters
  • loader - A closure or function that takes the storage directory path (&str) and returns Result<T, E>. This is where you perform downloading, file I/O, and parsing. The loader is only called once; if the data is already cached, it is ignored.
§Returns
  • Ok(&T) - A reference to the cached dataset.
§Errors

Returns any error produced by the loader closure on first invocation. Once data is successfully loaded and cached, this method never returns an error.

Source

pub fn is_loaded(&self) -> bool

Check whether the dataset has been loaded into memory.

§Returns

true if Dataset::load has been called successfully at least once, false otherwise.

Source

pub fn storage_dir(&self) -> &str

Get the storage directory path.

§Returns

The storage directory path as a string slice.

Trait Implementations§

Source§

impl<T> Debug for Dataset<T>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl<T> !Freeze for Dataset<T>

§

impl<T> RefUnwindSafe for Dataset<T>

§

impl<T> Send for Dataset<T>
where T: Send,

§

impl<T> Sync for Dataset<T>
where T: Sync + Send,

§

impl<T> Unpin for Dataset<T>
where T: Unpin,

§

impl<T> UnsafeUnpin for Dataset<T>
where T: UnsafeUnpin,

§

impl<T> UnwindSafe for Dataset<T>
where T: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.