1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
use load_boston_housing_raw_data;
use ;
use OnceLock;
// Use `OnceLock` for thread-safe delayed initialization
static BOSTON_HOUSING_DATA: =
new;
/// Internal function to load and process the raw boston housing dataset.
///
/// This function loads the raw boston housing dataset, parses the CSV-like format,
/// and converts it into structured ndarray arrays. It handles the parsing
/// of headers and data rows, extracting features and target values from the dataset.
///
/// # Returns
///
/// - `Array1<&'static str>`: Array of column headers from the dataset
/// - `Array2<f64>`: Feature matrix with shape (2, 13) where each row represents a housing sample and each column represents a feature
/// - `Array1<f64>`: Target values array with shape (2,) containing median home values in $1000s (MEDV)
///
/// # Panics
///
/// This function will panic if:
/// - The raw data cannot be parsed as valid f64 values
/// - The dataset structure doesn't match the expected format (2 samples, 14 columns total)
/// - Memory allocation fails during array creation
/// Loads the Boston Housing dataset with memoization
///
/// The Boston Housing dataset contains information about housing values in
/// suburbs of Boston. The dataset includes 13 features for predicting
/// the median value of owner-occupied homes (MEDV).
///
/// This function uses memoization to ensure the dataset is loaded only once and returns
/// static references to the cached data for optimal performance.
///
/// # Features
///
/// - CRIM: per capita crime rate by town
/// - ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
/// - INDUS: proportion of non-retail business acres per town
/// - CHAS: Charles River dummy variable (1 if tract bounds river; 0 otherwise)
/// - NOX: nitric oxides concentration (parts per 10 million)
/// - RM: average number of rooms per dwelling
/// - AGE: proportion of owner-occupied units built prior to 1940
/// - DIS: weighted distances to five Boston employment centres
/// - RAD: index of accessibility to radial highways
/// - TAX: full-value property-tax rate per $10,000
/// - PTRATIO: pupil-teacher ratio by town
/// - B: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
/// - LSTAT: % lower status of the population
///
/// # Returns
/// - `&'static Array1<&'static str>` - Static reference to the headers of the dataset (13 features + MEDV)
/// - `&'static Array2<f64>` - Static reference to the feature matrix where each row is a sample and each column is a feature
/// - `&'static Array1<f64>` - Static reference to median home values (MEDV) in $1000s
///
/// # Examples
/// ```rust
/// use rustyml::dataset::boston_housing::load_boston_housing;
///
/// let (headers, features, medv) = load_boston_housing();
/// assert_eq!(headers.len(), 14);
/// assert_eq!(features.shape(), &[506, 13]);
/// assert_eq!(medv.len(), 506);
/// ```
///
/// # Panics
///
/// This function will panic if:
/// - The raw data cannot be parsed as valid f64 values
/// - The dataset structure doesn't match the expected format (2 samples, 14 columns total)
/// - Memory allocation fails during array creation
/// Loads the Boston Housing dataset and returns owned copies.
///
/// Use this function when you need owned data that can be modified.
/// For read-only access, prefer `load_boston_housing()` which returns references.
///
/// # Returns
/// - `Array1<&'static str>` - Owned array of 14 column headers
/// - `Array2<f64>` - Owned feature matrix (506x13)
/// - `Array1<f64>` - Owned target values array (MEDV)
///
/// # Performance
/// This function creates owned copies by cloning the static data, which incurs additional memory allocation.
/// If you only need read-only access to the data, use `load_boston_housing()` instead for better performance.
///
/// # Examples
/// ```rust
/// use rustyml::dataset::boston_housing::load_boston_housing_owned;
///
/// let (mut headers, mut features, mut medv) = load_boston_housing_owned();
///
/// // You can now modify the data since these are owned copies
/// assert_eq!(headers.len(), 14);
/// assert_eq!(features.shape(), &[506, 13]);
/// assert_eq!(medv.len(), 506);
///
/// // Example: Modify feature values (not possible with references)
/// features[[0, 0]] = 0.1;
/// medv[0] = 25.5;
/// ```
///
/// # Panics
///
/// This function will panic if:
/// - The raw data cannot be parsed as valid f64 values
/// - The dataset structure doesn't match the expected format (2 samples, 14 columns total)
/// - Memory allocation fails during array creation