1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
use load_iris_raw_data;
use ;
use OnceLock;
// Use `OnceLock` for thread-safe delayed initialization
static IRIS_DATA: =
new;
/// Internal function to load and process the raw iris dataset.
///
/// This function loads the raw iris dataset, parses the CSV-like format,
/// and converts it into structured ndarray arrays. It handles the parsing
/// of headers and data rows, extracting features and labels from the dataset.
///
/// # Returns
///
/// - `Array1<&'static str>`: Array of column headers from the dataset
/// - `Array2<f64>`: Feature matrix with shape (150, 4) where each row represents a flower sample and each column represents a feature
/// - `Array1<&'static str>`: Target labels array with shape (150,) containing species classifications (Iris-setosa, Iris-versicolor, Iris-virginica)
///
/// # Panics
///
/// This function will panic if:
/// - The raw data cannot be parsed as valid f64 values
/// - The dataset structure doesn't match the expected format (150 samples, 5 columns total)
/// - Memory allocation fails during array creation
/// Loads the Iris dataset with memoization
///
/// The Iris dataset contains measurements of 150 iris flowers from three different species:
/// - Iris-setosa
/// - Iris-versicolor
/// - Iris-virginica
///
/// This function uses memoization to ensure the dataset is loaded only once and returns
/// static references to the cached data for optimal performance.
///
/// # Returns
///
/// - `&'static Array1<&'static str>`: Static reference to the headers of the dataset
/// - `&'static Array2<f64>`: Static reference to a 2D array of shape (150, 4) containing the feature measurements:
/// - sepal length in cm
/// - sepal width in cm
/// - petal length in cm
/// - petal width in cm
/// - `&'static Array1<&'static str>`: Static reference to a 1D array of length 150 containing the species labels
///
/// # Example
/// ```rust
/// use rustyml::dataset::iris::load_iris;
///
/// let (headers, features, labels) = load_iris();
/// assert_eq!(headers.len(), 5);
/// assert_eq!(features.shape(), &[150, 4]);
/// assert_eq!(labels.len(), 150);
/// ```
///
/// # Panics
///
/// This function will panic if:
/// - The raw data cannot be parsed as valid f64 values
/// - The dataset structure doesn't match the expected format (150 samples, 5 columns total)
/// - Memory allocation fails during array creation
/// Loads the Iris dataset and returns owned copies
///
/// Use this function when you need owned data that can be modified.
/// For read-only access, prefer `load_iris()` which returns references.
///
/// The Iris dataset contains measurements of 150 iris flowers from three different species:
/// - Iris-setosa
/// - Iris-versicolor
/// - Iris-virginica
///
/// # Returns
///
/// - `Array1<&'static str>`: Owned array of column headers from the dataset, containing 5 feature names plus the target label name
/// - `Array2<f64>`: Owned feature matrix with shape (150, 4) where each row represents a flower sample and each column represents a feature (sepal length, sepal width, petal length, petal width)
/// - `Array1<&'static str>`: Owned target labels array with shape (150,) containing species classifications (Iris-setosa, Iris-versicolor, Iris-virginica)
///
/// # Performance Notes
///
/// This function creates owned copies by cloning the static data, which incurs additional memory allocation.
/// If you only need read-only access to the data, use `load_iris()` instead for better performance.
///
/// # Examples
/// ```rust
/// use rustyml::dataset::iris::load_iris_owned;
///
/// let (mut headers, mut features, mut labels) = load_iris_owned();
///
/// // You can now modify the data since these are owned copies
/// assert_eq!(headers.len(), 5);
/// assert_eq!(features.shape(), &[150, 4]);
/// assert_eq!(labels.len(), 150);
///
/// // Example: Modify feature values (not possible with references)
/// features[[0, 0]] = 5.5;
/// labels[0] = "Modified-setosa";
/// ```
///
/// # Panics
///
/// This function will panic if:
/// - The raw data cannot be parsed as valid f64 values
/// - The dataset structure doesn't match the expected format (150 samples, 5 columns total)
/// - Memory allocation fails during array creation