1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
use ;
use ;
use OnceLock;
// Use `OnceLock` for thread-safe delayed initialization of red wine dataset
static RED_WINE_DATA: = new;
// Use `OnceLock` for thread-safe delayed initialization of white wine dataset
static WHITE_WINE_DATA: = new;
/// Parses wine quality dataset from raw string data into structured arrays.
///
/// This internal function extracts feature headers and wine quality data from raw string format,
/// converting them into ndarray structures suitable for machine learning operations.
///
/// # Parameters
///
/// - `raw_data` - Raw string containing both headers and wine data
/// - `n_samples` - Number of data samples expected
///
/// # Returns
///
/// - `Array1<&'static str>` - Array of feature names (headers)
/// - `Array2<f64>` - 2D array of wine quality features with shape (n_samples, 12)
/// Internal function to load and process the raw red wine quality dataset.
///
/// This function loads the raw red wine quality dataset, parses the comma-separated headers
/// and semicolon-separated data format, and converts it into structured ndarray arrays.
///
/// # Returns
///
/// - `Array1<&'static str>`: Array of column headers from the dataset
/// - `Array2<f64>`: Feature matrix where each row represents a wine sample and each column represents a feature
///
/// # Panics
///
/// This function will panic if:
/// - The raw data cannot be parsed as valid f64 values
/// - The dataset structure doesn't match the expected format
/// - Memory allocation fails during array creation
/// Internal function to load and process the raw white wine quality dataset.
///
/// This function loads the raw white wine quality dataset, parses the comma-separated headers
/// and semicolon-separated data format, and converts it into structured ndarray arrays.
///
/// # Returns
///
/// - `Array1<&'static str>`: Array of column headers from the dataset
/// - `Array2<f64>`: Feature matrix where each row represents a wine sample and each column represents a feature
///
/// # Panics
///
/// This function will panic if:
/// - The raw data cannot be parsed as valid f64 values
/// - The dataset structure doesn't match the expected format
/// - Memory allocation fails during array creation
/// Loads the red wine quality dataset with memoization for machine learning tasks.
///
/// This function provides access to a curated red wine quality dataset containing
/// physical properties and quality ratings. The dataset includes 11 features
/// such as acidity levels, sugar content, pH, and alcohol percentage, along with
/// quality scores ranging from 3 to 8. Uses memoization for improved performance
/// on repeated calls.
///
/// # Returns
///
/// - `&'static Array1<&'static str>` - Static reference to array of feature names including:
/// - fixed acidity
/// - volatile acidity
/// - citric acid
/// - residual sugar
/// - chlorides
/// - free sulfur dioxide
/// - total sulfur dioxide
/// - density
/// - pH
/// - sulphates
/// - alcohol
/// - quality
/// - `&'static Array2<f64>` - Static reference to 2D feature matrix with shape (1599, 12)
/// containing normalized wine quality measurements
///
/// # Example
/// ```rust
/// use rustyml::dataset::wine_quality::load_red_wine_quality;
///
/// let (headers, features) = load_red_wine_quality();
///
/// // Access feature names
/// println!("Features: {:?}", headers);
///
/// // Use the feature matrix for machine learning
/// assert_eq!(features.ncols(), 12); // 12 features
/// assert_eq!(features.nrows(), 1599); // 1599 samples
///
/// // Example: Extract quality scores (last column)
/// let quality_scores = features.column(11); // Quality is the 12th column (index 11)
/// ```
///
/// # Panics
///
/// This function will panic if:
/// - The raw data cannot be parsed as valid f64 values
/// - The dataset structure doesn't match the expected format
/// - Memory allocation fails during array creation
/// Loads the white wine quality dataset with memoization for machine learning tasks.
///
/// This function provides access to a curated white wine quality dataset with
/// the same structure as the red wine dataset. It contains physicochemical
/// properties and quality ratings specifically for white wine samples.
/// The dataset uses the same 12 features but with different value ranges
/// typical for white wine characteristics. Uses memoization for improved
/// performance on repeated calls.
///
/// # Returns
///
/// A tuple containing:
/// - `&'static Array1<&'static str>` - Static reference to an array of feature names including
/// - fixed acidity
/// - volatile acidity
/// - citric acid
/// - residual sugar
/// - chlorides
/// - free sulfur dioxide
/// - total sulfur dioxide
/// - density
/// - pH
/// - sulphates
/// - alcohol
/// - quality
/// - `&'static Array2<f64>` - Static reference to 2D feature matrix with shape (4898, 12)
/// containing normalized white wine quality measurements
///
/// # Example
/// ```rust
/// use rustyml::dataset::wine_quality::load_white_wine_quality;
/// let (headers, features) = load_white_wine_quality();
///
/// // Access feature names
/// println!("Features: {:?}", headers);
///
/// // Use the feature matrix for machine learning
/// assert_eq!(features.ncols(), 12); // 12 features
/// assert_eq!(features.nrows(), 4898); // 4898 samples
///
/// // Example: Extract quality scores (last column)
/// let quality_scores = features.column(11); // Quality is the 12th column (index 11)
/// ```
///
/// # Panics
///
/// This function will panic if:
/// - The raw data cannot be parsed as valid f64 values
/// - The dataset structure doesn't match the expected format
/// - Memory allocation fails during array creation
/// Loads the red wine quality dataset and returns owned copies
///
/// Use this function when you need owned data that can be modified.
/// For read-only access, prefer `load_red_wine_quality()` which returns references.
///
/// # Returns
///
/// - `Array1<&'static str>`: Owned array of column headers from the dataset, containing 12 feature names
/// - `Array2<f64>`: Owned feature matrix with shape (1599, 12) where each row represents a wine sample and each column represents a feature (fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol, quality)
///
/// # Performance Notes
///
/// This function creates owned copies by cloning the static data, which incurs additional memory allocation.
/// If you only need read-only access to the data, use `load_red_wine_quality()` instead for better performance.
///
/// # Examples
/// ```rust
/// use rustyml::dataset::wine_quality::load_red_wine_quality_owned;
///
/// let (mut headers, mut features) = load_red_wine_quality_owned();
///
/// // You can now modify the data since these are owned copies
/// assert_eq!(headers.len(), 12);
/// assert_eq!(features.shape(), &[1599, 12]);
///
/// // Example: Modify feature values (not possible with references)
/// features[[0, 0]] = 10.0;
/// ```
///
/// # Panics
///
/// This function will panic if:
/// - The raw data cannot be parsed as valid f64 values
/// - The dataset structure doesn't match the expected format
/// - Memory allocation fails during array creation
/// Loads the white wine quality dataset and returns owned copies
///
/// Use this function when you need owned data that can be modified.
/// For read-only access, prefer `load_white_wine_quality()` which returns references.
///
/// # Returns
///
/// - `Array1<&'static str>`: Owned array of column headers from the dataset, containing 12 feature names
/// - `Array2<f64>`: Owned feature matrix with shape (4898, 12) where each row represents a wine sample and each column represents a feature (fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol, quality)
///
/// # Performance Notes
///
/// This function creates owned copies by cloning the static data, which incurs additional memory allocation.
/// If you only need read-only access to the data, use `load_white_wine_quality()` instead for better performance.
///
/// # Examples
/// ```rust
/// use rustyml::dataset::wine_quality::load_white_wine_quality_owned;
/// let (mut headers, mut features) = load_white_wine_quality_owned();
///
/// // You can now modify the data since these are owned copies
/// assert_eq!(headers.len(), 12);
/// assert_eq!(features.shape(), &[4898, 12]);
///
/// // Example: Modify feature values (not possible with references)
/// features[[0, 0]] = 10.0;
/// ```
///
/// # Panics
///
/// This function will panic if:
/// - The raw data cannot be parsed as valid f64 values
/// - The dataset structure doesn't match the expected format
/// - Memory allocation fails during array creation