Expand description
Machine learning loss functions and optimizer state types. ML Toolkit — loss functions, optimizers, activations, metrics.
§Determinism Contract
- All functions are deterministic (no randomness except seeded kfold).
- Kahan summation for all reductions.
- Stable sort for AUC-ROC with index tie-breaking.
Structs§
- Adam
State - Adam optimizer state.
- Confusion
Matrix - Binary confusion matrix.
- Early
Stopping State - Early stopping state tracker.
- Lbfgs
State - L-BFGS optimizer state.
- SgdState
- SGD optimizer state.
Functions§
- accuracy
- Accuracy: (TP + TN) / total.
- adam_
step - Adam step: sequential, deterministic.
- apply_
dropout - Apply dropout: element-wise multiply data by mask.
- auc_roc
- AUC-ROC via trapezoidal rule. DETERMINISM: sort by score with stable sort + index tie-breaking.
- batch_
indices - Creates deterministic batch index ranges for mini-batch training.
- batch_
norm - Batch normalization (inference mode). y = gamma * (x - running_mean) / sqrt(running_var + eps) + beta.
- binary_
cross_ entropy - Binary cross-entropy: -sum(t*ln(p) + (1-t)*ln(1-p)) / n.
- bootstrap
- Bootstrap confidence interval for a statistic (e.g., mean).
Returns (point_estimate, ci_lower, ci_upper, standard_error).
stat_fnis 0=mean, 1=median. - confusion_
matrix - Build confusion matrix from predicted and actual boolean labels.
- cross_
entropy_ loss - Cross-entropy loss: -sum(target * ln(pred + eps)) / n.
- dropout_
mask - Dropout mask generation using seeded RNG for determinism. Returns mask of 0.0 and scale values (1/(1-p)) using inverted dropout.
- embedding
- Embedding lookup: maps integer indices to dense vectors.
- f1_
score - F1 score: 2 * (precision * recall) / (precision + recall).
- gru_
cell - GRU cell forward pass.
- gru_
cell_ fused - Fused GRU cell: minimizes intermediate tensor allocations.
- hinge_
loss - Hinge loss: sum(max(0, 1 - target * pred)) / n.
- huber_
loss - Huber loss: quadratic for small errors, linear for large.
- kfold_
indices - K-fold cross-validation indices. DETERMINISM: uses seeded RNG (Fisher-Yates).
- l1_grad
- L1 regularization gradient: lambda * sign(params).
- l1_
penalty - L1 regularization penalty: lambda * sum(|params|).
- l2_grad
- L2 regularization gradient: lambda * params.
- l2_
penalty - L2 regularization penalty: 0.5 * lambda * sum(params^2).
- lbfgs_
step - L-BFGS step with strong Wolfe line search.
- lr_
cosine - Learning rate schedule: cosine annealing. lr = min_lr + 0.5 * (max_lr - min_lr) * (1 + cos(pi * epoch / total_epochs))
- lr_
linear_ warmup - Learning rate schedule: linear warmup. lr = initial_lr * min(1.0, epoch / warmup_epochs).
- lr_
step_ decay - Learning rate schedule: step decay. lr = initial_lr * decay_rate^(floor(epoch / step_size))
- lstm_
cell - LSTM cell forward pass.
- lstm_
cell_ fused - Fused LSTM cell: minimizes intermediate tensor allocations.
- mse_
loss - Mean Squared Error: sum((pred - target)^2) / n.
- multi_
head_ attention - Multi-head attention: Q, K, V projections + scaled dot-product attention + output projection.
- pca
- Principal Component Analysis via SVD of centered data.
- permutation_
test - Permutation test: test whether two groups differ on a statistic. Returns (observed_diff, p_value).
- precision
- Precision: TP / (TP + FP).
- recall
- Recall / sensitivity: TP / (TP + FN).
- sgd_
step - SGD step: sequential, deterministic.
- stratified_
split - Stratified train/test split: maintains class proportions in both sets.
labelsis an array of integer class labels,test_fracis fraction for test set. Returns (train_indices, test_indices). - train_
test_ split - Train/test split indices.
- wolfe_
line_ search - Strong Wolfe line search.