Skip to main content

Module rosetta_ml

Module rosetta_ml 

Source
Expand description

ML-Powered Rosetta Diagnostics (ROSETTA-ML-001)

“Grepping is the stone age. ML enables automatic root cause analysis.”

This module implements ML-based diagnostics for format conversion using aprender’s own algorithms (dogfooding). No external dependencies.

§Theoretical Foundation

  • Tarantula SBFL: Jones et al. (2002) - fault localization via suspiciousness
  • Mahalanobis Distance: Mahalanobis (1936) - multivariate anomaly detection
  • Wilson Score: Wilson (1927) - confidence intervals for binomial proportions
  • BM25 + RRF: Robertson (1994), Cormack (2009) - hybrid retrieval

§Dogfooding Matrix

TaskAlgorithmModule
Error predictionLinearRegressionaprender::linear_model
Failure clusteringKMeansaprender::cluster
Feature reductionPCAaprender::preprocessing
Error classificationGaussianNBaprender::classification

§References

  1. Jones, J. A., et al. (2002). Visualization of test information. ICSE ’02.
  2. Mahalanobis, P. C. (1936). On the generalised distance in statistics.
  3. Wilson, E. B. (1927). Probable inference. JASA 22(158).
  4. Robertson, S. E., et al. (1994). Okapi at TREC-3.

Structs§

AnomalyDetector
Anomaly detector using regularized Mahalanobis distance.
CanaryFile
Canary file for a model
CategorySummary
Category summary with Tarantula suspiciousness
ConversionIssue
Conversion issue identified in Hansei analysis
DecisionStats
Decision statistics for Tarantula suspiciousness calculation
ErrorPattern
Error pattern with learned success rate
ErrorPatternLibrary
Error pattern library with hybrid retrieval
HanseiReport
Hansei (reflection) report for conversion batch
TarantulaTracker
Tarantula fault localization tracker
TensorCanary
Tensor statistics for canary regression testing
TensorFeatures
12-dimensional feature vector for tensor anomaly detection.
WilsonScore
Wilson score confidence interval for binomial proportions.

Enums§

AndonLevel
Andon alert levels (Toyota Production System)
ConversionCategory
Conversion category for Pareto analysis
ConversionDecision
Granular conversion decisions tracked for Tarantula analysis.
FixAction
Fix action for conversion errors
JidokaViolation
Jidoka stop conditions detected in tensor features
PatternSource
Source of error pattern
Priority
Priority level for conversion decisions
Regression
Regression type detected by canary
Severity
Issue severity for Hansei report
Trend
Trend direction for conversion quality