aprender 0.31.2

Next-generation ML framework in pure Rust — `cargo install aprender` for the `apr` CLI
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
<!-- PCU: examples-decision-tree-regression | contract: contracts/apr-page-examples-decision-tree-regression-v1.yaml -->
<!-- Example: cargo run -p aprender-core --example decision_tree_regression -->
<!-- Status: enforced -->

# Decision Tree Regression - Housing Price Prediction

**Status**: ✅ Complete (Verified with 16+ tests)

This case study demonstrates decision tree regression for predicting continuous values (housing prices) using the CART algorithm with Mean Squared Error criterion.

**What You'll Learn**:
- When to use decision trees for regression vs linear models
- How MSE splitting criterion works
- Effect of max_depth on overfitting
- Hyperparameter tuning (min_samples_split, min_samples_leaf)
- Handling non-linear relationships

**Prerequisites**: Basic understanding of regression metrics (R², MSE)

---

## Problem Statement

**Task**: Predict house prices (continuous values) from features like square footage, bedrooms, and age.

**Why Decision Tree Regression?**
- **Non-linear relationships**: Price doesn't scale linearly with size
- **Feature interactions**: Large house + old → different than small house + old
- **Interpretability**: Real estate agents can explain "rules"
- **No feature scaling**: Use raw sqft, years, etc.

**When NOT to use**:
- Linear relationships → Use LinearRegression (simpler, better generalization)
- Need smooth predictions → Trees predict step functions
- Extrapolation beyond training range → Trees can't extrapolate

---

## Dataset

### Simulated Housing Data

```rust
// Features: [sqft, bedrooms, bathrooms, age]
// Target: price (in thousands)
let x_train = Matrix::from_vec(20, 4, vec![
    // Small houses
    1000.0, 2.0, 1.0, 50.0,  // $140k
    1100.0, 2.0, 1.0, 45.0,  // $145k
    1200.0, 2.0, 1.0, 40.0,  // $150k
    1300.0, 2.0, 1.5, 35.0,  // $160k
    // Medium houses
    1500.0, 3.0, 2.0, 25.0,  // $250k
    1600.0, 3.0, 2.0, 20.0,  // $265k
    // ... (more samples)
    // Luxury houses (exponential price increase)
    4000.0, 7.0, 5.0, 0.5,   // $1100k
    4500.0, 8.0, 6.0, 0.5,   // $1350k
]).unwrap();

let y_train = Vector::from_slice(&[
    140.0, 145.0, 150.0, 160.0,  // Small
    250.0, 265.0, 280.0, 295.0,  // Medium
    360.0, 410.0, 480.0, 550.0,  // Large
    650.0, 720.0, 800.0, 920.0,  // Very large
    1100.0, 1350.0, 1600.0, 1950.0,  // Luxury
]);
```

**Data Characteristics**:
- 20 training samples, 4 features
- Price increases non-linearly with size
- Age discount effect
- Multiple price tiers

---

## Implementation

### Step 1: Train Basic Regression Tree

```rust
use aprender::prelude::*;

// Create and configure tree
let mut tree = DecisionTreeRegressor::new()
    .with_max_depth(5);

// Fit to training data
tree.fit(&x_train, &y_train).unwrap();

// Predict on test data
let x_test = Matrix::from_vec(1, 4, vec![
    1900.0, 4.0, 2.0, 12.0  // Medium-large house
]).unwrap();

let predicted_price = tree.predict(&x_test);
println!("Predicted: ${:.0}k", predicted_price.as_slice()[0]);
// Output: Predicted: $295k

// Evaluate with R² score
let r2 = tree.score(&x_train, &y_train);
println!("R² Score: {:.4}", r2);
// Output: R² Score: 1.0000 (perfect on training data)
```

**Key API Methods**:
- `new()`: Create tree with default parameters
- `with_max_depth(depth)`: Limit tree depth (prevent overfitting)
- `fit(&x, &y)`: Train tree on data (MSE criterion)
- `predict(&x)`: Predict continuous values
- `score(&x, &y)`: Compute R² score

**Test Reference**: `src/tree/mod.rs::test_regression_tree_fit_simple_linear`

---

### Step 2: Compare with Linear Regression

Decision trees excel at non-linear patterns. Let's compare:

```rust
// Train both models
let mut tree = DecisionTreeRegressor::new().with_max_depth(5);
let mut linear = LinearRegression::new();

tree.fit(&x_train, &y_train).unwrap();
linear.fit(&x_train, &y_train).unwrap();

// Compare R² scores
let tree_r2 = tree.score(&x_train, &y_train);
let linear_r2 = linear.score(&x_train, &y_train);

println!("Decision Tree R²: {:.4}", tree_r2);   // 1.0000
println!("Linear Regression R²: {:.4}", linear_r2); // 0.9844
println!("Tree advantage: {:.4}", tree_r2 - linear_r2); // 0.0156
```

**Why Tree Performs Better**:
- Captures non-linear price tiers (small/medium/large/luxury)
- Learns feature interactions (size × age)
- No assumption of linear relationship

**When Linear Wins**:
- Truly linear relationships
- Small datasets (better generalization)
- Need smooth predictions

**Test Reference**: `src/tree/mod.rs::test_regression_tree_vs_linear`

---

### Step 3: Understanding MSE Splitting

**How it works**:
1. For each feature and threshold, compute MSE for left and right children
2. Choose split that maximizes variance reduction
3. Leaf nodes predict mean of training samples

**Example Split Decision**:
```text
Parent node: [140, 145, 250, 265, 1100, 1350]
Mean = 541.67, MSE = 184,444

Candidate split: sqft ≤ 1500
  Left:  [140, 145]  → Mean = 142.5, MSE = 6.25
  Right: [250, 265, 1100, 1350] → Mean = 741.25, MSE = 234,756

Weighted MSE = (2/6)*6.25 + (4/6)*234,756 = 156,506
Variance Reduction = 184,444 - 156,506 = 27,938 ✅ Good split!
```

**Pure Node Example**:
```text
Node: [250, 250, 250]
Mean = 250, MSE = 0 → Stop splitting (pure)
```

**Test Reference**: `src/tree/mod.rs::test_regression_tree_constant_target`

---

## Hyperparameter Tuning

### max_depth: Controlling Complexity

```rust
let depths = [2, 3, 5, 10];

for &depth in &depths {
    let mut tree = DecisionTreeRegressor::new().with_max_depth(depth);
    tree.fit(&x_train, &y_train).unwrap();

    let r2 = tree.score(&x_train, &y_train);
    println!("max_depth={}: R² = {:.4}", depth, r2);
}

// Output:
// max_depth=2: R² = 0.9374  (underfitting)
// max_depth=3: R² = 0.9903  (good balance)
// max_depth=5: R² = 1.0000  (perfect fit)
// max_depth=10: R² = 1.0000 (potential overfitting)
```

**Interpretation**:
- **depth=2**: Too shallow, can't capture complexity → underfitting
- **depth=3**: Good balance, likely generalizes well
- **depth=5+**: Perfect training fit, risk of overfitting on test data

**Rule of Thumb**:
- Start with max_depth = 3-5
- Increase if underfitting (low train R²)
- Decrease if overfitting (high train R², low test R²)
- Use cross-validation to find optimal depth

**Test Reference**: `src/tree/mod.rs::test_regression_tree_max_depth`

---

### min_samples_split: Pruning Parameter

```rust
// Default tree (no pruning)
let mut tree_default = DecisionTreeRegressor::new()
    .with_max_depth(10);

// Pruned tree (requires 4 samples to split)
let mut tree_pruned = DecisionTreeRegressor::new()
    .with_max_depth(10)
    .with_min_samples_split(4)
    .with_min_samples_leaf(2);

tree_default.fit(&x_train, &y_train).unwrap();
tree_pruned.fit(&x_train, &y_train).unwrap();

let r2_default = tree_default.score(&x_train, &y_train);
let r2_pruned = tree_pruned.score(&x_train, &y_train);

println!("Default tree R²: {:.4}", r2_default); // 1.0000
println!("Pruned tree R²: {:.4}", r2_pruned);   // 0.9658
```

**Effect of Pruning**:
- **min_samples_split=4**: Don't split nodes with < 4 samples
- **min_samples_leaf=2**: Ensure each leaf has ≥ 2 samples
- **Result**: Simpler tree, prevents overfitting on small groups

**When to Use**:
- Noisy data (prevents fitting to outliers)
- Small datasets (improves generalization)
- Prefer simpler models (Occam's razor)

**Test Reference**: `src/tree/mod.rs::test_regression_tree_min_samples_*`

---

## Non-Linear Patterns

Decision trees naturally handle non-linear relationships. Example with quadratic data:

```rust
// Pure quadratic relationship: y = x²
let x_quad = Matrix::from_vec(10, 1, vec![
    1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0
]).unwrap();

let y_quad = Vector::from_slice(&[
    1.0, 4.0, 9.0, 16.0, 25.0, 36.0, 49.0, 64.0, 81.0, 100.0
]);

// Train both models
let mut tree = DecisionTreeRegressor::new().with_max_depth(4);
let mut linear = LinearRegression::new();

tree.fit(&x_quad, &y_quad).unwrap();
linear.fit(&x_quad, &y_quad).unwrap();

let tree_r2 = tree.score(&x_quad, &y_quad);
let linear_r2 = linear.score(&x_quad, &y_quad);

println!("Decision Tree R²: {:.4}", tree_r2);   // 1.0000
println!("Linear Regression R²: {:.4}", linear_r2); // 0.9498
```

**Why Tree Wins**:
- Learns step function approximation of parabola
- No need for manual feature engineering (x²)
- Captures local patterns

**Linear Model Struggles**:
- Tries to fit straight line to curve
- Needs polynomial features: `[x, x²]`
- Can't learn without feature engineering

**Visualization**:
```text
x    True y   Tree Pred   Linear Pred
1    1        1.0         -11.0
2    4        4.0         0.0
3    9        9.0         11.0
5    25       25.0        33.0
10   100      100.0       88.0
```

Decision tree predictions match exactly (or very close), while linear model has systematic error (underpredicts low, overpredicts high).

**Test Reference**: `src/tree/mod.rs::test_regression_tree_predict_nonlinear`

---

## Edge Cases and Validation

### Constant Target

```rust
// All houses same price (constant target)
let x = Matrix::from_vec(5, 1, vec![1.0, 2.0, 3.0, 4.0, 5.0]).unwrap();
let y = Vector::from_slice(&[5.0, 5.0, 5.0, 5.0, 5.0]);

let mut tree = DecisionTreeRegressor::new().with_max_depth(3);
tree.fit(&x, &y).unwrap();

// Should predict constant value
let predictions = tree.predict(&x);
for &pred in predictions.as_slice() {
    assert!((pred - 5.0).abs() < 1e-5); // All ≈ 5.0
}
```

**Behavior**: Tree creates single leaf node (MSE = 0, pure node).

**Test Reference**: `src/tree/mod.rs::test_regression_tree_constant_target`

---

### Single Sample

```rust
// Edge case: only 1 training sample
let x = Matrix::from_vec(1, 2, vec![1.0, 2.0]).unwrap();
let y = Vector::from_slice(&[10.0]);

let mut tree = DecisionTreeRegressor::new().with_max_depth(3);
tree.fit(&x, &y).unwrap();

// Predict on same sample
let pred = tree.predict(&x);
assert!((pred.as_slice()[0] - 10.0).abs() < 1e-5);
```

**Behavior**: Creates single leaf with mean = 10.0.

**Test Reference**: `src/tree/mod.rs::test_regression_tree_single_sample`

---

### Validation Errors

```rust
// Error: Mismatched dimensions
let x = Matrix::from_vec(5, 2, vec![...]).unwrap();
let y = Vector::from_slice(&[1.0, 2.0, 3.0]); // Only 3 labels!

let mut tree = DecisionTreeRegressor::new();
assert!(tree.fit(&x, &y).is_err()); // Returns error

// Error: Predict before fit
let tree = DecisionTreeRegressor::new();
// tree.predict(&x); // Would panic!
```

**Validation Checks**:
- `x.rows() == y.len()` (sample count match)
- Tree must be fitted before predict
- Features count must match between train and test

**Test Reference**: `src/tree/mod.rs::test_regression_tree_validation_*`

---

## Practical Recommendations

### When to Use Decision Tree Regression

✅ **Use when**:
- Non-linear relationships in data
- Feature interactions are important
- Interpretability is needed (can visualize tree)
- No feature scaling available (mixed units)
- Building block for ensembles (Random Forest)

❌ **Don't use when**:
- Linear relationships (use LinearRegression)
- Small datasets (< 50 samples, risk overfitting)
- Need smooth predictions (trees predict step functions)
- Extrapolation required (beyond training range)

### Hyperparameter Selection Guide

| Parameter | Typical Range | Effect | When to Increase | When to Decrease |
|-----------|---------------|--------|------------------|------------------|
| **max_depth** | 3-10 | Tree complexity | Underfitting (low train R²) | Overfitting (train R² >> test R²) |
| **min_samples_split** | 2-10 | Minimum samples to split | Overfitting | Underfitting |
| **min_samples_leaf** | 1-5 | Minimum leaf size | Overfitting | Underfitting |

**Tuning Process**:
1. Start with defaults: `max_depth=5`, `min_samples_split=2`, `min_samples_leaf=1`
2. Check train/test R² (use cross-validation)
3. If overfitting: Decrease max_depth or increase min_samples_*
4. If underfitting: Increase max_depth or decrease min_samples_*
5. Use grid search for optimal combination

### Debugging Checklist

**Low R² on training data**:
- [ ] Tree too shallow (increase max_depth)
- [ ] Too much pruning (decrease min_samples_split/leaf)
- [ ] Data has no predictive signal

**Perfect train R², poor test R²**:
- [ ] Overfitting! (decrease max_depth)
- [ ] Add pruning (increase min_samples_split/leaf)
- [ ] Need more training data

**Unexpected predictions**:
- [ ] Check feature scaling (not needed, but verify units)
- [ ] Inspect tree structure (if implemented)
- [ ] Verify training data quality

---

## Full Example Code

```rust
use aprender::prelude::*;

fn main() {
    // Housing data
    let x_train = Matrix::from_vec(8, 3, vec![
        1500.0, 3.0, 10.0,  // $280k
        2000.0, 4.0, 5.0,   // $350k
        1200.0, 2.0, 30.0,  // $180k
        1800.0, 3.0, 15.0,  // $300k
        2500.0, 5.0, 2.0,   // $450k
        1000.0, 2.0, 50.0,  // $150k
        2200.0, 4.0, 8.0,   // $380k
        1600.0, 3.0, 20.0,  // $260k
    ]).unwrap();

    let y_train = Vector::from_slice(&[
        280.0, 350.0, 180.0, 300.0, 450.0, 150.0, 380.0, 260.0
    ]);

    // Train regression tree
    let mut tree = DecisionTreeRegressor::new()
        .with_max_depth(4)
        .with_min_samples_split(2);

    tree.fit(&x_train, &y_train).unwrap();

    // Evaluate
    let r2 = tree.score(&x_train, &y_train);
    println!("R² Score: {:.3}", r2);

    // Predict on new house
    let x_new = Matrix::from_vec(1, 3, vec![1900.0, 4.0, 12.0]).unwrap();
    let price = tree.predict(&x_new);
    println!("Predicted price: ${:.0}k", price.as_slice()[0]);
}
```

**Run the example**:
```bash
cargo run --example decision_tree_regression
```

---

## Related Reading

**Theory**:
- [Decision Trees Theory]../ml-fundamentals/decision-trees.md - MSE criterion, CART algorithm
- [Regression Metrics]../ml-fundamentals/regression-metrics.md - R², MSE, MAE

**Other Algorithms**:
- [Linear Regression]./linear-regression.md - Baseline comparison
- [Random Forest (Future)]./random-forest-regression.md - Ensemble of trees

**Code Reference**:
- Implementation: `src/tree/mod.rs` (DecisionTreeRegressor)
- Tests: `src/tree/mod.rs::tests::test_regression_tree_*` (16 tests)
- Example: `examples/decision_tree_regression.rs`

---

## Summary

**Key Takeaways**:
- ✅ Decision tree regression uses MSE criterion (variance reduction)
- ✅ Leaf nodes predict mean of training samples
- ✅ max_depth prevents overfitting (typical: 3-7)
- ✅ Pruning parameters (min_samples_*) add regularization
- ✅ Excels at non-linear relationships without feature engineering
- ✅ Interpretable but can overfit (use ensembles in production)

**Best Practices**:
1. Start with max_depth=5, tune with cross-validation
2. Compare with LinearRegression baseline
3. Use R² for evaluation, check train/test gap
4. Prune with min_samples_split/leaf if overfitting
5. Consider Random Forest for better accuracy

**Verification**: Implementation tested with 16 comprehensive tests in `src/tree/mod.rs`, including edge cases, parameter validation, and comparison with linear regression.

---

**Next**: [Random Forest Regression (Future)](./random-forest-regression.md)

**Previous**: [Decision Tree - Iris Classification](./decision-tree-iris.md)