pub struct Iris { /* private fields */ }Expand description
A struct representing the Iris dataset with lazy loading.
The dataset is not loaded until you call one of the data accessor methods. Once loaded, the data is cached for subsequent accesses.
§About Dataset
The Iris dataset is a classic dataset for classification tasks. It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.
Features:
- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm
Labels:
- species name (in
&str):"setosa","versicolor","virginica"
See more information at https://archive.ics.uci.edu/dataset/53/iris
§Citation
R. A. Fisher. “Iris,” UCI Machine Learning Repository, [Online]. Available: https://doi.org/10.24432/C56C76
§Thread Safety
This struct automatically implements Send and Sync (All fields implement them), making it safe to share across threads.
The internal Dataset ensures thread-safe lazy initialization.
§Example
use dataset_core::datasets::iris::Iris;
let download_dir = "./iris"; // the code will create the directory if it doesn't exist
let dataset = Iris::new(download_dir);
let features = dataset.features().unwrap();
let labels = dataset.labels().unwrap();
let (features, labels) = dataset.data().unwrap(); // this is also a way to get features and labels
// you can use `.to_owned()` to get owned copies of the data
let mut features_owned = features.to_owned();
let mut labels_owned = labels.to_owned();
// Example: Modify feature values
features_owned[[0, 0]] = 5.5;
labels_owned[0] = "setosa-modified";
assert_eq!(features.shape(), &[150, 4]);
assert_eq!(labels.len(), 150);
// clean up: remove the downloaded files (dispensable)
std::fs::remove_dir_all(download_dir).unwrap();Implementations§
Source§impl Iris
impl Iris
Sourcepub fn new(storage_dir: &str) -> Self
pub fn new(storage_dir: &str) -> Self
Create a new Iris instance without loading data.
The dataset will be loaded lazily when you first call any data accessor method. This is a lightweight operation that only stores the storage directory.
§Parameters
storage_dir- Directory where the dataset will be stored.
§Returns
Self-Irisinstance ready for lazy loading.
Sourcepub fn features(&self) -> Result<&Array2<f64>, DatasetError>
pub fn features(&self) -> Result<&Array2<f64>, DatasetError>
Get a reference to the feature matrix.
This method triggers lazy loading on first call. Subsequent calls return the cached data instantly.
§Returns
&Array2<f64>- Reference to feature matrix with shape(150, 4)containing:- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm
§Errors
Returns DatasetError if:
- Download fails due to network issues
- File extraction or I/O operations fail
- Data format is invalid (wrong number of columns, unparseable values, or invalid labels)
- Dataset size doesn’t match expected dimensions (150 samples, 4 features)
Sourcepub fn labels(&self) -> Result<&Array1<&'static str>, DatasetError>
pub fn labels(&self) -> Result<&Array1<&'static str>, DatasetError>
Get a reference to the labels vector.
This method triggers lazy loading on first call. Subsequent calls return the cached data instantly.
§Returns
&Array1<&'static str>- Reference to labels vector with shape(150,)containing species names ("setosa","versicolor","virginica")
§Errors
Returns DatasetError if:
- Download fails due to network issues
- File extraction or I/O operations fail
- Data format is invalid (wrong number of columns, unparseable values, or invalid labels)
- Dataset size doesn’t match expected dimensions (150 samples)
Sourcepub fn data(
&self,
) -> Result<(&Array2<f64>, &Array1<&'static str>), DatasetError>
pub fn data( &self, ) -> Result<(&Array2<f64>, &Array1<&'static str>), DatasetError>
Get both features and labels as references.
This method triggers lazy loading on first call. Subsequent calls return the cached data instantly.
§Returns
&Array2<f64>- Reference to feature matrix with shape(150, 4)containing:- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm
&Array1<&'static str>- Reference to labels vector with shape(150,)containing species names ("setosa","versicolor","virginica")
§Errors
Returns DatasetError if:
- Download fails due to network issues
- File extraction or I/O operations fail
- Data format is invalid (wrong number of columns, unparseable values, or invalid labels)
- Dataset size doesn’t match expected dimensions (150 samples, 4 features)