tensorlogic-train 0.1.0

Training loops, loss composition, and optimization schedules for TensorLogic
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
# TensorLogic Training Examples

This directory contains 11 comprehensive examples demonstrating the `tensorlogic-train` crate's capabilities, including advanced features like curriculum learning, transfer learning, hyperparameter optimization, cross-validation, ensemble methods, and complete integration workflows.

## Running Examples

Run any example with:

```bash
cargo run --example <example_name>
```

For example:
```bash
cargo run --example 01_basic_training
```

## Examples Overview

### 01_basic_training.rs
**Basic regression training with SGD**

Demonstrates:
- Simple linear regression
- MSE loss function
- SGD optimizer with momentum
- Epoch-level callbacks
- Training history tracking

Key concepts:
- Creating synthetic regression data
- Initializing model parameters
- Basic training loop
- Monitoring training progress

Run with:
```bash
cargo run --example 01_basic_training
```

---

### 02_classification_with_metrics.rs
**Multi-class classification with comprehensive metrics**

Demonstrates:
- 3-class classification problem
- Cross-entropy loss with label smoothing
- Adam optimizer
- Multiple evaluation metrics (Accuracy, Precision, Recall, F1)
- Confusion matrix analysis
- Per-class performance reporting

Key concepts:
- Classification data preparation
- Label smoothing for regularization
- Metric tracking during training
- Post-training analysis with confusion matrices
- Per-class metric computation

Run with:
```bash
cargo run --example 02_classification_with_metrics
```

---

### 03_callbacks_and_checkpointing.rs
**Advanced training with callbacks and state management**

Demonstrates:
- Early stopping to prevent overfitting
- Model checkpointing (save best models)
- Learning rate scheduling (Cosine Annealing)
- Reduce LR on plateau
- Gradient monitoring
- Learning rate finder (optional)
- Training state persistence

Key concepts:
- Callback composition
- Automatic hyperparameter adjustment
- Model persistence
- Gradient health monitoring
- Optimal learning rate discovery

Features:
- **EarlyStoppingCallback**: Stops training when validation stops improving
- **CheckpointCallback**: Saves best model states
- **CosineAnnealingLrScheduler**: Gradually reduces learning rate
- **ReduceLrOnPlateauCallback**: Adaptive LR reduction
- **GradientMonitor**: Detects vanishing/exploding gradients
- **LearningRateFinder**: Find optimal learning rate (uncomment to use)

Run with:
```bash
cargo run --example 03_callbacks_and_checkpointing
```

---

### 04_logical_loss_training.rs
**Training with logical constraints and rules**

Demonstrates:
- Combining supervised loss with logical constraints
- Rule satisfaction loss
- Constraint violation penalties
- Multi-objective loss weighting
- Logic-guided learning with AdamW

Key concepts:
- Multi-objective optimization
- Logical rule enforcement during training
- Constraint satisfaction in neural networks
- Weight decay for regularization
- Domain-specific loss design

Loss composition:
```
L_total = 1.0 × L_supervised + 2.0 × L_rules + 5.0 × L_constraints
```

This is particularly useful when you want your model to:
- Respect domain-specific rules
- Satisfy hard constraints
- Learn from both data and logical knowledge

Run with:
```bash
cargo run --example 04_logical_loss_training
```

---

### 05_profiling_and_monitoring.rs
**Advanced monitoring with profiling and histogram tracking**

Demonstrates:
- Performance profiling during training
- Weight histogram tracking
- Gradient monitoring
- Comprehensive training diagnostics
- Performance optimization insights
- Combining multiple monitoring callbacks

Key concepts:
- ProfilingCallback for timing analysis
- HistogramCallback for weight distributions
- GradientMonitor for training health
- Comprehensive performance reporting
- Training optimization recommendations

Features:
- Batch/epoch timing measurement
- Throughput metrics (samples/sec, batches/sec)
- Weight distribution tracking
- Gradient health monitoring
- Automatic issue detection

Run with:
```bash
cargo run --example 05_profiling_and_monitoring
```

---

### 06_curriculum_learning.rs
**Progressive training with curriculum learning strategies**

Demonstrates:
- LinearCurriculum (gradual difficulty increase)
- ExponentialCurriculum (rapid ramp-up)
- SelfPacedCurriculum (model-driven pace)
- CompetenceCurriculum (adaptive difficulty)
- TaskCurriculum (multi-task progressive training)
- CurriculumManager for state management

Key concepts:
- Difficulty score computation
- Sample selection based on model competence
- Progressive training from easy to hard samples
- Dynamic curriculum adaptation
- Stateful curriculum management

Run with:
```bash
cargo run --example 06_curriculum_learning
```

---

### 07_transfer_learning.rs
**Transfer learning and fine-tuning strategies**

Demonstrates:
- Layer freezing and unfreezing
- Progressive unfreezing (gradual layer activation)
- Discriminative fine-tuning (layer-specific learning rates)
- Feature extraction mode
- TransferLearningManager for unified control

Key concepts:
- Freezing pretrained layers
- Gradual unfreezing to prevent catastrophic forgetting
- Different learning rates for different layers
- Feature extraction vs. fine-tuning
- Multi-phase transfer learning workflow

Run with:
```bash
cargo run --example 07_transfer_learning
```

---

### 08_hyperparameter_optimization.rs
**Automated hyperparameter search**

Demonstrates:
- Grid search (exhaustive exploration)
- Random search (stochastic sampling)
- Parameter space definition (discrete, continuous, log-uniform, int ranges)
- Result tracking and comparison
- Best configuration selection

Key concepts:
- Defining search spaces
- Exhaustive vs. stochastic search
- Log-uniform distributions for learning rates
- Configuration generation and evaluation
- Result analysis and ranking

Run with:
```bash
cargo run --example 08_hyperparameter_optimization
```

---

### 09_cross_validation.rs
**Robust model evaluation with cross-validation**

Demonstrates:
- K-Fold cross-validation
- Stratified K-Fold (class-balanced)
- Time Series Split (temporal-aware)
- Leave-One-Out cross-validation
- Result aggregation and statistical analysis

Key concepts:
- Train/validation split strategies
- Class distribution preservation
- Temporal ordering for time series
- Result aggregation across folds
- Statistical significance testing

Run with:
```bash
cargo run --example 09_cross_validation
```

---

### 10_ensemble_learning.rs
**Model ensembling for improved performance**

Demonstrates:
- Voting ensemble (hard and soft voting)
- Averaging ensemble (regression)
- Weighted voting based on model performance
- Stacking ensemble (meta-learner)
- Bagging utilities (bootstrap aggregating)

Key concepts:
- Model diversity and combination
- Hard vs. soft voting
- Meta-learning with stacking
- Bootstrap sampling for bagging
- Out-of-bag validation

Run with:
```bash
cargo run --example 10_ensemble_learning
```

---

### 11_advanced_integration.rs
**End-to-end production training pipeline**

Demonstrates:
- Complete workflow combining multiple advanced features
- Hyperparameter optimization with random search
- Cross-validation for robust evaluation
- Ensemble learning with bagging
- Production-grade training pipeline

Key concepts:
- Integrating multiple features in a realistic workflow
- Best practices for model development
- Systematic approach to hyperparameter tuning
- Robust evaluation strategies
- Ensemble methods for improved predictions

Workflow:
1. **Hyperparameter Optimization**: Random search to find optimal learning rate and batch size
2. **Cross-Validation**: 5-fold CV for robust performance estimation
3. **Ensemble Learning**: Train 5 models with bagging and soft voting

This example represents a complete, production-ready training pipeline
that can serve as a template for real-world projects.

Run with:
```bash
cargo run --example 11_advanced_integration
```

---

## Common Patterns

### Creating a Trainer

```rust
use tensorlogic_train::{Trainer, TrainerConfig, MseLoss, AdamOptimizer};

let config = TrainerConfig {
    num_epochs: 50,
    batch_size: 32,
    shuffle: true,
    validation_frequency: 1,
    ..Default::default()
};

let loss = Box::new(MseLoss);
let optimizer = Box::new(AdamOptimizer::new(OptimizerConfig::default()));
let trainer = Trainer::new(config, loss, optimizer);
```

### Adding Callbacks

```rust
use tensorlogic_train::{CallbackList, EpochCallback, EarlyStoppingCallback};

let mut callbacks = CallbackList::new();
callbacks.add(Box::new(EpochCallback::new(true)));
callbacks.add(Box::new(EarlyStoppingCallback::new(10, 0.001)));

trainer = trainer.with_callbacks(callbacks);
```

### Adding Metrics

```rust
use tensorlogic_train::{MetricTracker, Accuracy, F1Score};

let mut metrics = MetricTracker::new();
metrics.add(Box::new(Accuracy::default()));
metrics.add(Box::new(F1Score::default()));

trainer = trainer.with_metrics(metrics);
```

### Training

```rust
let history = trainer.train(
    &train_data.view(),
    &train_targets.view(),
    Some(&val_data.view()),
    Some(&val_targets.view()),
    &mut parameters,
)?;
```

## Advanced Topics

### Custom Callbacks

Implement the `Callback` trait to create custom training hooks:

```rust
use tensorlogic_train::{Callback, TrainingState, TrainResult};

struct MyCallback;

impl Callback for MyCallback {
    fn on_epoch_end(&mut self, epoch: usize, state: &TrainingState) -> TrainResult<()> {
        println!("Epoch {} completed with loss: {}", epoch, state.train_loss);
        Ok(())
    }
}
```

### Custom Metrics

Implement the `Metric` trait for domain-specific evaluation:

```rust
use tensorlogic_train::{Metric, TrainResult};
use scirs2_core::ndarray::{ArrayView, Ix2};

struct MyMetric;

impl Metric for MyMetric {
    fn name(&self) -> &str {
        "MyMetric"
    }

    fn compute(&mut self, predictions: &ArrayView<f64, Ix2>, targets: &ArrayView<f64, Ix2>) -> TrainResult<f64> {
        // Your metric computation
        Ok(0.0)
    }

    fn reset(&mut self) {}
}
```

### Learning Rate Scheduling

Different schedulers for different scenarios:

- **StepLR**: Step decay at fixed intervals
- **ExponentialLR**: Exponential decay
- **CosineAnnealingLR**: Smooth cosine decay (recommended for most cases)
- **OneCycleLR**: Super-convergence schedule
- **CyclicLR**: Cyclical learning rates
- **WarmupCosine**: Warmup + cosine annealing

Example:
```rust
use tensorlogic_train::{OneCycleLrScheduler, LrScheduler};

let scheduler = Box::new(OneCycleLrScheduler::new(
    0.0001,  // initial_lr
    0.01,    // max_lr
    50,      // total_epochs
));

trainer = trainer.with_scheduler(scheduler);
```

## Tips

### Basic Training (Examples 01-05)

1. **Start Simple**: Begin with `01_basic_training.rs` to understand the basics
2. **Monitor Metrics**: Use `02_classification_with_metrics.rs` to learn comprehensive evaluation
3. **Use Callbacks**: Leverage `03_callbacks_and_checkpointing.rs` for production training
4. **Logical Constraints**: Apply domain knowledge with `04_logical_loss_training.rs`
5. **Performance Monitoring**: Use `05_profiling_and_monitoring.rs` for optimization insights

### Advanced Features (Examples 06-10)

6. **Curriculum Learning** (`06_curriculum_learning.rs`):
   - Start with LinearCurriculum for predictable progression
   - Use SelfPacedCurriculum when model performance varies significantly
   - Combine with early stopping for best results
   - Monitor difficulty distribution across epochs

7. **Transfer Learning** (`07_transfer_learning.rs`):
   - Begin with feature extraction (frozen encoder) for 10 epochs
   - Then use discriminative fine-tuning with smaller LRs
   - Progressive unfreezing helps prevent catastrophic forgetting
   - Use lower learning rates (1e-5 to 1e-4) for fine-tuning

8. **Hyperparameter Optimization** (`08_hyperparameter_optimization.rs`):
   - Use grid search for ≤3 hyperparameters
   - Use random search for ≥4 hyperparameters
   - Always use log-uniform for learning rates
   - Run random search 2-3× longer than grid would take

9. **Cross-Validation** (`09_cross_validation.rs`):
   - Always use stratified K-fold for classification
   - Use time series split for temporal data
   - Report mean ± std dev for transparency
   - 5-10 folds provides good bias-variance tradeoff

10. **Ensemble Learning** (`10_ensemble_learning.rs`):
    - Use 5-10 models for good diversity vs. cost
    - Soft voting > hard voting (when applicable)
    - Stacking can outperform simple voting
    - Ensure model diversity for best results

### General Best Practices

11. **Hyperparameter Tuning**:
    - Use `LearningRateFinder` to find optimal learning rate
    - Start with Adam optimizer for most problems
    - Use AdamW for better generalization
    - Try OneCycleLR for faster convergence

12. **Regularization**:
    - Label smoothing for classification
    - Weight decay (especially with AdamW)
    - Early stopping to prevent overfitting
    - Dropout (implement in your model)
    - Curriculum learning for gradual difficulty

13. **Debugging**:
    - Use `GradientMonitor` to detect training issues
    - Check confusion matrices for classification problems
    - Monitor per-class metrics for imbalanced datasets
    - Save checkpoints frequently
    - Profile with `ProfilingCallback` to find bottlenecks

14. **Production Workflow**:
    - Use cross-validation for robust evaluation
    - Hyperparameter search on dev set
    - Curriculum learning for complex tasks
    - Transfer learning when pretrained models available
    - Ensemble top models for final deployment

## Further Reading

- [Main README]../../README.md - Project overview
- [API Documentation]../src/lib.rs - Complete API reference
- [TODO]../TODO.md - Upcoming features

## Contributing

Found a bug or have an example idea? See [CONTRIBUTING.md](../../../CONTRIBUTING.md).

## License

Apache-2.0