Expand description
ML-Powered Rosetta Diagnostics (ROSETTA-ML-001)
“Grepping is the stone age. ML enables automatic root cause analysis.”
This module implements ML-based diagnostics for format conversion using aprender’s own algorithms (dogfooding). No external dependencies.
§Theoretical Foundation
- Tarantula SBFL: Jones et al. (2002) - fault localization via suspiciousness
- Mahalanobis Distance: Mahalanobis (1936) - multivariate anomaly detection
- Wilson Score: Wilson (1927) - confidence intervals for binomial proportions
- BM25 + RRF: Robertson (1994), Cormack (2009) - hybrid retrieval
§Dogfooding Matrix
| Task | Algorithm | Module |
|---|---|---|
| Error prediction | LinearRegression | aprender::linear_model |
| Failure clustering | KMeans | aprender::cluster |
| Feature reduction | PCA | aprender::preprocessing |
| Error classification | GaussianNB | aprender::classification |
§References
- Jones, J. A., et al. (2002). Visualization of test information. ICSE ’02.
- Mahalanobis, P. C. (1936). On the generalised distance in statistics.
- Wilson, E. B. (1927). Probable inference. JASA 22(158).
- Robertson, S. E., et al. (1994). Okapi at TREC-3.
Structs§
- Anomaly
Detector - Anomaly detector using regularized Mahalanobis distance.
- Canary
File - Canary file for a model
- Category
Summary - Category summary with Tarantula suspiciousness
- Conversion
Issue - Conversion issue identified in Hansei analysis
- Decision
Stats - Decision statistics for Tarantula suspiciousness calculation
- Error
Pattern - Error pattern with learned success rate
- Error
Pattern Library - Error pattern library with hybrid retrieval
- Hansei
Report - Hansei (reflection) report for conversion batch
- Tarantula
Tracker - Tarantula fault localization tracker
- Tensor
Canary - Tensor statistics for canary regression testing
- Tensor
Features - 12-dimensional feature vector for tensor anomaly detection.
- Wilson
Score - Wilson score confidence interval for binomial proportions.
Enums§
- Andon
Level - Andon alert levels (Toyota Production System)
- Conversion
Category - Conversion category for Pareto analysis
- Conversion
Decision - Granular conversion decisions tracked for Tarantula analysis.
- FixAction
- Fix action for conversion errors
- Jidoka
Violation - Jidoka stop conditions detected in tensor features
- Pattern
Source - Source of error pattern
- Priority
- Priority level for conversion decisions
- Regression
- Regression type detected by canary
- Severity
- Issue severity for Hansei report
- Trend
- Trend direction for conversion quality