SciRS2 Metrics
Comprehensive machine learning evaluation metrics for the SciRS2 scientific computing ecosystem. Covers classification, regression, clustering, ranking, object detection, information retrieval, generative model evaluation, fairness, segmentation, and streaming/online metrics — with SIMD acceleration and parallel processing throughout.
Features
Classification Metrics
- Accuracy, precision, recall, F1-score, F-beta score
- Matthews correlation coefficient (MCC), Cohen's kappa
- Balanced accuracy, specificity, sensitivity
- ROC curve, AUC, average precision score
- Precision-recall curve and average precision (AP)
- Confusion matrix and classification report
- Log loss (cross-entropy), Brier score
- Hinge loss, Hamming loss, Jaccard score
- Multi-class and multi-label support (micro/macro/weighted averaging)
- Optimal threshold finding (G-means, custom criteria)
Regression Metrics
- MSE, RMSE, MAE, median absolute error, max error
- R² score, explained variance, adjusted R²
- MAPE (mean absolute percentage error), SMAPE (symmetric MAPE)
- MSLE (mean squared log error), Huber loss
- Quantile loss (pinball loss), Tweedie deviance
- Relative absolute error, relative squared error
- Normalized RMSE
Clustering Metrics
- Internal (no ground truth): Silhouette score/samples, Calinski-Harabasz index, Davies-Bouldin index, Dunn index
- External (with ground truth): Adjusted Rand Index (ARI), Normalized MI, Adjusted MI, V-measure, Fowlkes-Mallows score
- Homogeneity, completeness, contingency matrix, pair confusion matrix
- Cluster stability, consensus scoring, gap statistic
Ranking and Information Retrieval
- NDCG (normalized discounted cumulative gain), DCG
- Mean Average Precision (MAP), MAP@k
- Mean Reciprocal Rank (MRR)
- Precision@k, Recall@k
- Kendall's tau, Spearman's rank correlation
- Label ranking average precision (LRAP)
Object Detection Metrics
- Intersection over Union (IoU) for bounding boxes
- Average Precision (AP), mean AP (mAP) at IoU thresholds
- Non-Maximum Suppression (NMS) utilities
- PASCAL VOC and COCO-style evaluation protocols
- Per-class AP breakdown
Generative Model Evaluation
- Fréchet Inception Distance (FID)
- Inception Score (IS)
- Precision and Recall for generative models
- Maximum Mean Discrepancy (MMD)
- Kernel-based evaluation metrics
Segmentation Metrics
- Pixel accuracy, mean pixel accuracy
- Intersection over Union (IoU) per-class and mean IoU
- Dice coefficient, Jaccard index
- Boundary F-measure
- Panoptic Quality (PQ)
Fairness and Bias Detection
- Demographic parity difference and ratio
- Equalized odds difference
- Equal opportunity difference
- Disparate impact ratio
- Consistency score across groups
- Slice analysis for subgroup performance
- Intersectional fairness measures
- Bias detection and robustness testing
Advanced Regression Metrics (regression_advanced)
- Pinball (quantile) loss for quantile regression
- Interval score for prediction interval evaluation
- Coverage probability, interval width
- Winkler score
Streaming Metrics (Online Estimation)
- Memory-efficient online evaluation for large-scale and real-time applications
- Optimization patterns (
streaming/optimization/patterns/):- Batching: group evaluations into batches for efficiency
- Buffering: ring-buffer based streaming metric windows
- Partitioning: shard metrics by key/group
- Windowing: sliding and tumbling window metrics
Statistical Testing and Validation
- McNemar's test for classifier comparison
- Cochran's Q test for multiple classifiers
- Friedman test (non-parametric)
- Wilcoxon signed-rank test
- Bootstrap confidence intervals
- Cross-validation utilities (K-fold, stratified, time series)
Bayesian Evaluation
- Bayes factor model comparison
- BIC, AIC, WAIC, LOO-CV information criteria
- Posterior predictive checks
- Bayesian model averaging
Visualization
- ROC curve, precision-recall curve, calibration curve
- Confusion matrix heatmap
- Learning and validation curves
- Histogram and scatter plots
- Dashboard server (HTTP, real-time, with Chart.js)
- Plotters and Plotly backends
Installation
[]
= "0.4.4"
Selective features:
[]
= { = "0.4.4", = ["neural_common", "plotters_backend"] }
Available features:
plotly_backend(default) — interactive web visualizationsoptim_integration(default) — integration withscirs2-optimizeneural_common— integration withscirs2-neuralplotters_backend— static PNG/SVG via Plottersdashboard_server— HTTP dashboard server (requires tokio)
Quick Start
Classification
use ;
use array;
let y_true = array!;
let y_pred = array!;
let y_scores = array!;
let accuracy = accuracy_score?;
let precision = precision_score?;
let recall = recall_score?;
let f1 = f1_score?;
let auc = roc_auc_score?;
println!;
Regression
use ;
use array;
let y_true = array!;
let y_pred = array!;
let mse = mean_squared_error?;
let mae = mean_absolute_error?;
let r2 = r2_score?;
println!;
Clustering
use ;
use ;
let data = arr2;
let pred = array!;
let truth = array!;
let silhouette = silhouette_score?;
let db = davies_bouldin_score?;
let ari = adjusted_rand_index?;
println!;
Object Detection
use ;
// Compute IoU between predicted and ground-truth bounding boxes
// Boxes in [x1, y1, x2, y2] format
let pred_box = ;
let true_box = ;
let iou = iou_score;
// mAP@0.5 over multiple classes
let map50 = compute_map?;
println!;
Information Retrieval
use ;
use array;
// NDCG@5 for a single query
let relevance = array!;
let scores = array!;
let ndcg = ndcg_score?;
// MAP over a query set
let map = mean_average_precision?;
println!;
Fairness Metrics
use ;
let y_true = array!;
let y_pred = array!;
let groups = array!; // 0 = group A, 1 = group B
let dp_diff = demographic_parity?;
let eo_diff = equalized_odds?;
let di_ratio = disparate_impact?;
println!;
Streaming Metrics
use ;
// Sliding window accuracy over last 1000 predictions
let mut window = new;
for in prediction_stream
// Batch evaluations for efficiency
let mut batcher = new;
for batch in dataset.batches
let final_metrics = batcher.finalize?;
Visualization
use ;
let = roc_curve?;
let viz = visualize_roc_curve;
let options = new
.with_width
.with_height
.with_grid
.with_legend;
viz.save_to_file?;
Interactive Dashboard
use ;
let mut config = default;
config.title = "Training Dashboard".to_string;
config.refresh_interval = 2;
let dashboard = new;
dashboard.add_metric?;
dashboard.add_metric?;
// Start HTTP server on port 8080 (requires `dashboard_server` feature)
start_http_server?;
// Export results
let json = dashboard.export_to_json?;
let html = dashboard.generate_html?;
Integration with SciRS2 Ecosystem
Neural Networks (neural_common feature)
use ;
let accuracy = accuracy;
let f1 = f1_score;
let callback = new;
// Pass callback to scirs2-neural trainer
Optimization (optim_integration feature)
use ;
let mut scheduler = new;
let new_lr = scheduler.step_with_metric;
let params = vec!;
let mut tuner = new;
let result = tuner.random_search?;
API Summary
| Module | Key Functions |
|---|---|
classification |
accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, confusion_matrix, average_precision_score |
regression |
mean_squared_error, mean_absolute_error, r2_score, mape, explained_variance_score |
clustering |
silhouette_score, calinski_harabasz_score, davies_bouldin_score, adjusted_rand_index, normalized_mutual_info_score |
ranking |
ndcg_score, mean_average_precision, mrr_score, precision_at_k |
detection |
iou_score, average_precision, compute_map, nms |
fairness.advanced |
demographic_parity, equalized_odds, disparate_impact |
segmentation |
pixel_accuracy, mean_iou, dice_coefficient |
generative |
frechet_inception_distance, inception_score, mmd |
streaming.optimization.patterns |
BatchingAccumulator, SlidingWindowMetric, BufferingAccumulator, PartitionedMetric |
evaluation |
cross_val_score, train_test_split, learning_curve, grid_search_cv |
visualization |
visualize_roc_curve, visualize_confusion_matrix, visualize_metric |
Performance
- SIMD acceleration with automatic hardware detection (SSE2, AVX2, AVX-512)
- Parallel processing via Rayon for batch metric computation
- Memory-efficient streaming algorithms for large-scale evaluation
- 142+ comprehensive tests with numerical precision validation
- Zero-warning builds
License
Licensed under the Apache License 2.0. See LICENSE for details.
Authors
COOLJAPAN OU (Team KitaSan)