# Scoring Convergence & Hardening
> Sub-spec of [pmat-spec.md](../pmat-spec.md) | Component 21
## Problem Statement
Four scoring commands exist independently. None share analysis, none persist
results, none gate each other:
| `pmat comply check` | pass/fail + exit code | `.pmat/project.toml` (config + timestamp) | Exit code only |
| `pmat rust-project-score` | 289 pts, grade A-F | Nothing | Nothing |
| `pmat repo-score` | 0-110, separate formula | Optional `--output` file, `--update-badge` | Nothing |
| `pmat work score <id>` | DBC 5-dim, 0-1 | `.pmat-work/` contract files | `work complete` only |
Comply checks for pattern presence (e.g., CB-501 counts `unwrap()` calls,
CB-503 checks `.clippy.toml` existence) while RPS runs actual tools (e.g.,
`cargo clippy`, `cargo audit`). Both analyze dead code and dependency counts
independently. Neither writes to `.pmat-metrics/`, so there is no historical
trend data and no regression detection.
`pmat repo-score` is a third overlapping scorer with yet another formula.
## Design Principle: One Canonical Score Path
**`pmat score` is the single entry point.** All other scoring commands become
internal subsystems that feed into it. Users run ONE command. CI gates on
ONE number.
```
pmat score ← the ONLY command users/CI should run
├── runs comply check internally (CB-xxx checks)
├── runs rust-project-score internally (11 category scorers)
├── reads coverage from cache or runs cargo llvm-cov
├── reads DBC portfolio from .pmat-work/
├── computes geometric composite
├── writes ALL results to .pmat-metrics/commit-<sha>-meta.json
└── exits 0/1 based on gate threshold
```
### Deprecation Status
`pmat score` is implemented. Current status:
| `pmat comply check` | Keep as standalone + internal subsystem | Also feeds `pmat score` |
| `pmat rust-project-score` | Keep as standalone + internal subsystem | Also feeds `pmat score` |
| `pmat repo-score` | **Deprecated** — `score` alias freed | Replaced by `pmat score` |
| `pmat work score` | Keep | Per-contract scoring, feeds DBC sub-score |
| `pmat work codebase-score` | Keep | Portfolio aggregate, feeds DBC sub-score |
`pmat comply check` and `pmat rust-project-score` remain available as
standalone commands for debugging. `pmat score` is the canonical path
for gates, CI, and hooks.
## 1. Composite Score (`pmat score`)
### Sub-Score Normalization
| RPS | `RustProjectScoreOrchestrator` | Category-average % (already normalized) |
| Comply | `pmat comply check --format json` | `100 - (errors * 10 + warnings * 3)`, floor 0 |
| Coverage | `.pmat-metrics/coverage.result` | Line coverage % |
| Muda | CB-300 (size-normalized) | `100 - muda_score` (lower waste = higher) |
| EvoScore | `.pmat-metrics/commit-*-tests.json` | `pass/total * 100` |
| DBC | `compute_codebase_score()` | 50% coverage + 30% lint + 20% score |
| File Health | density-based | `(1 - over1000/total) * 100` |
| PV Lint | `pv lint --format json` (planned) | 3-gate pass rate * 100 |
### Formula
```
composite = (rps * comply * coverage * muda_inv * evo * dbc * file_health) ^ (1/7)
```
When PV Lint is integrated (§10), it becomes the 8th sub-score and
the exponent changes to 1/8.
Geometric mean: one zero kills the composite.
### Orchestration
`pmat score` runs both subsystems and aggregates:
```
1. Build agent context index (if stale)
2. Run comply checks → produces ComplianceReport
- Extract: error_count, warning_count, muda_score, evoscore,
file_health, dead_code_pct from individual check results
3. Run RPS scorers → produces CategoryScore[]
- Note: comply pattern-checks and RPS tool-runs are complementary,
not duplicative (comply checks .clippy.toml exists, RPS runs clippy)
4. Read coverage cache (or run cargo llvm-cov if stale)
5. Read DBC portfolio from .pmat-work/
6. Compute composite from all sub-scores
7. Write everything to .pmat-metrics/commit-<sha>-meta.json
8. Run cross-validation invariants (XV-001 through XV-010)
9. Run regression check against previous commits
10. Print report, exit 0/1
```
### Metric Persistence
Every `pmat score` run writes to `.pmat-metrics/commit-<sha>-meta.json`:
```json
{
"sha": "abc123",
"timestamp": "2026-03-20T10:30:00Z",
"composite": 72.4,
"sub_scores": {
"rps": 69.4,
"comply": 85.0,
"coverage": 94.2,
"muda_inv": 63.7,
"evoscore": 84.8,
"dbc": 78.0,
"file_health": 71.0
},
"rps_categories": { ... },
"comply_errors": 2,
"comply_warnings": 7,
"cross_validation": { "passed": 8, "failed": 2, "violations": ["XV-001", "XV-008"] }
}
```
This is the ONLY place scores are persisted. All trend, regression, and
CI reporting reads from this file.
### CLI
```bash
pmat score # Full run: comply + RPS + composite
pmat score --gate 70 # Exit 1 if composite < 70
pmat score --format json # Machine-readable for CI
pmat score --trend # Sparkline of last 10 commits
pmat score --regression-check # Exit 1 if regression detected
```
## 2. Deep Integration with `pmat query`
`pmat query` already enriches results with TDG grade, coverage, churn,
faults, and duplicates. The composite score extends this in three ways:
### 2a. Score-Aware Ranking (`--rank-by score`)
Functions ranked by how much they contribute to (or drag down) the
composite score. Uses the sub-scores as weights:
```bash
pmat query "handler" --rank-by score --limit 10
# Ranks by: tdg_weight * churn_weight * (1 - fault_count/10)
# Functions that are high-TDG, high-churn, and have faults rank first
# These are the functions most likely to improve the composite if fixed
```
### 2b. Score-Gated Search (`--min-score`)
Filter query results to only functions in files that contribute to
sub-score failures. When composite is D (66.9), the bottleneck is
Code Quality (26.9%) and Muda (63.7%). `--min-score` surfaces functions
in files contributing to those weak categories:
```bash
pmat query "parse" --min-score 70 # Only functions in files scoring >= 70
pmat query "cache" --below-score 50 # Only functions in files dragging score down
```
### 2c. Score Diagnosis (`pmat query --score-diagnosis`)
Shows which functions are most responsible for each sub-score. Maps
the composite breakdown to concrete code locations:
```bash
pmat query --score-diagnosis --limit 5
# Output:
# COMPOSITE: 66.9 (Grade D)
#
# Dragging Code Quality (26.9%):
# src/services/ast/languages/c_visitor.rs:parse_block cc=81 grade=F
# src/services/lightweight_provability.rs:analyze cc=80 grade=F
#
# Dragging Muda (63.7):
# Inventory: 132 dead items — top: src/services/cache.rs (23 dead)
# Over-processing: cc>20 in 5 files — top: c_visitor.rs (cc=81)
#
# Dragging File Health (71.0):
# 14 files >1000 lines — top: definition.rs (1280 lines)
```
This replaces ad-hoc investigation with a single query that maps the
composite score to actionable code locations via the existing index.
### 2d. Enrichment Flag (`--score`)
Add composite score data to regular query results, similar to existing
`--coverage` and `--churn` flags:
```bash
pmat query "error handling" --score --limit 5
# Each result shows: function, TDG grade, composite sub-score impact
```
The composite score data comes from the persisted
`.pmat-metrics/commit-<sha>-meta.json`. If stale, `pmat query --score`
triggers a lightweight recomputation (comply + file health only, skip
full RPS which is slow).
## 3. CI Integration
### Single CI Gate
CI should run ONE command. Not `make lint && make test && pmat comply &&
pmat rust-project-score`. One command, one exit code, one artifact.
```yaml
# .github/workflows/quality-gate.yml
name: Quality Gate
on: [push, pull_request]
jobs:
score:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- name: Quality Gate
run: pmat score --gate 70 --format json > score.json
- uses: actions/upload-artifact@v4
with:
name: pmat-score
path: score.json
if: always()
```
### PR Status Check
`pmat score --format json` output is consumed by a GitHub Action or
CI script that posts the composite score as a PR status check:
```
✅ pmat score: 74.2 (+2.1 from base)
### Branch Protection + Clean-Room
Configure branch protection on `pmat score` status check. Clean-room
gate: `pmat score --gate "${PMAT_GATE_THRESHOLD:-60}" --format json`.
### Pre-Push Hook
```bash
# Installed by: pmat hooks install
# Fast path: reads cached .pmat-metrics/ JSON, O(1)
pmat score --regression-check --fast
```
If no cached score exists (first push), runs full `pmat score`.
## 4. Score Regression Gates (CB-145)
Compare current composite against `.pmat-metrics/commit-*-meta.json` history.
```
Block if:
- single_commit_delta < -5 (any sub-score dropped >5 pts)
- window_delta < -10 (sliding 5-commit degradation >10 pts)
```
Configuration in `.pmat.yaml`:
```yaml
score:
gate: 70
regression:
single_commit_max_drop: 5
window_size: 5
window_max_drop: 10
```
## 5. Cross-Validation Invariants (CB-146)
Run as part of `pmat score` step 8. Ten invariant rules detect contradictions
between sub-scores:
| XV-001 | CB-200 pass => RPS Code Quality >= 40% | TDG grades and code quality must agree |
| XV-002 | Muda Inventory > 20 => File Health != A | High waste contradicts good file health |
| XV-003 | Coverage >= 90% => RPS Testing >= 60% | Coverage and testing score must agree |
| XV-004 | CB-304 dead code < 2% => Muda Inventory < 30 | Low dead code means low inventory |
| XV-005 | EvoScore > 0.8 => no regression in window | Improving trend and regression contradict |
| XV-006 | DBC score >= 80% => comply errors < 5 | High DBC and many errors contradict |
| XV-007 | RPS Grade A => composite >= 75 | Category A and low composite contradict |
| XV-008 | Comply 0 errors => RPS >= 60% | Clean comply and low RPS contradict |
| XV-009 | File health A => Muda Over-processing < 15 | Good files mean low complexity waste |
| XV-010 | Coverage < 50% => composite < 80 | Low coverage must cap composite |
Warns on individual violations. Errors if 3+ fail (systemic inconsistency).
## 6. Cross-Crate Stack Compliance (CB-150)
`pmat score --stack` resolves sovereign deps from Cargo.toml, runs
lightweight scoring on each, caps project grade if deps score low.
```bash
pmat score --stack # Include sovereign dep scores
pmat score --stack --deps-only # Show dep scores only
```
If any sovereign dep scores F, project caps at B.
## 7. Work-Contract-DBC Binding (CB-1250)
`pmat work complete` runs existing quality gates (`run_quality_gates()`:
changed-module tests, clippy, golden traces) then runs `pmat score --gate 60`
as an advisory composite check (CB-1250). DBC scoring (5 dimensions:
spec_depth, falsification_coverage, invariant_health, subcontracting,
traceability) feeds the DBC sub-score in the composite.
```
pmat work complete PMAT-456
1. run_quality_gates() — tests, clippy, golden traces
2. ALL falsifiable claims must pass
3. DBC contract score >= threshold (default 70%)
4. pmat score --gate 60 (advisory composite check)
5. Evidence bundle written to .pmat-work/PMAT-456/
```
## 8. Spec-Work Linkage (CB-148)
Bidirectional traceability between `docs/specifications/components/*.md`
and `pmat work` tickets. Warns when spec planned sections have no
corresponding work items, or work items reference nonexistent specs.
```bash
pmat work list --spec repo-health # Work items linked to spec
pmat work list --spec-gap # Specs with no tickets
```
## 9. Leak-Driven Fault Checks
Empirical analysis of 241 fix commits across the sovereign stack (30-day
window, 2026-02-18 to 2026-03-20). Three toolable defect classes:
**CB-531 Silent Parameter Discard** (35/241, 15%): Clap `#[arg]` flags
accepted but never functionally read. Dataflow from Parser struct to handler.
**CB-534 Division Safety + DIV_RISK fault** (20/241, 8%): Division where
denominator can be zero without guard. Extends CB-528 (`.len()` only) to
all division patterns. Adds DIV_RISK to `detect_fault_patterns()`.
Note: CB-532 was previously used for "assert in library API" (removed for
high false-positive rate). This check uses CB-534 to avoid ID reuse.
**CB-533 Stale Path References** (20/241, 8%): Paths in Makefiles, CI YAML,
and string literals referencing deleted files/directories.
## 10. Provable Contracts Integration (CB-1201)
### Problem
`pv lint` runs as a standalone tool across the sovereign stack (trueno: 27
contracts, aprender: 12 apr-cli YAMLs + 109 bindings implemented, entrenar: 2,
realizar: 2). `pmat comply` now runs `pv lint` (CB-1201) and checks enforcement
penetration (CB-1340: aprender at 43.7% aggregate, apr-cli at 63% [CLI, below 95% threshold]). `pmat score` integration via PV-01..PV-05.
### Design
Integrate `pv lint` as the 8th sub-score in `pmat score` and as a comply
enforcement gate in `pmat comply`.
**`pmat score` sub-score (PV Lint)**:
Five Whys root cause: `mean_score` measures contract *completeness* (how many
optional sections filled in), not *quality*. A valid contract with basic
preconditions but no kani harness scores ~0.45. Scoring must weight gates:
**CB-1202: Contract Coverage** — checks if repos with critical ML/GPU/data
functions have contracts covering them. Keywords: forward, backward, optimizer,
checkpoint, loss, gradient, sampling, kv_cache, tokenize, quantize, kernel,
dispatch, softmax, matmul, gemm, batch. A repo with 93 `forward` functions
and 0 contracts (realizar) = FAIL. Threshold: >50% of critical keywords
must have at least 1 contract.
```
if no contracts/ AND has critical keywords:
pv_score = 0.0 (FAIL — critical functions without contracts)
elif no contracts/:
pv_score = 50.0 (neutral — no critical functions)
else:
run: pv lint --format json
existence = 30 pts (contracts/ dir exists)
validity = 30 pts if validate gate passes, else 0
cleanliness = 25 pts if audit passes with 0 findings, 15 if findings, 0 if fails
completeness = mean_score * 15 pts (0-15 based on maturity)
pv_score = existence + validity + cleanliness + completeness (0-100)
```
**Critical Five Whys finding**: `pv lint` checks YAML structure only — it never
reads source code. `audit_contract()` takes `&Contract` (YAML data), not a
project path. `FalsificationTest.test` is `Option<String>` parsed but never
resolved. `tests_file`/`tests_module` YAML fields are silently dropped by serde.
Result: trueno declares 105 test refs, 85 don't exist, `pv lint` reports 0 findings.
**pmat enforcement**: ANY missing or failing test = PV score 0.0 (kills composite
via geometric mean). CB-1201 FAILS with "Unfalsifiable" message. This is the
Popperian standard — a claim without evidence is not a claim.
**Upstream fix needed** (PMAT-521): `pv lint` Gate 4 (source verification) +
Gate 5 (test execution). Until then, pmat's `compute_contract_fulfillment`
fills the gap by scanning source for `fn test_*` matches.
**`pmat comply` check (CB-1201)**:
```
if contracts/ exists:
run: pv lint --min-score 0.3
PASS if pv lint exits 0
WARN if pv lint exits 1 (findings below threshold)
FAIL if pv lint exits 2+ (validation errors)
else:
SKIP (no contracts directory)
```
**CI gate**: `quality-gate.yml` runs `pv lint --format sarif` and uploads
findings as GitHub code scanning results.
### Cross-Validation
New invariant: XV-011: if CB-1200 (provable contracts) passes AND
contracts/ has >10 YAML files, PV Lint score must be >= 30%.
### Stack Enforcement
`pmat score --stack` runs `pv lint` on each sovereign dep with contracts.
Failed PV Lint in a dependency flags as a warning.
### Target: `core::contracts` (Compiler-Enforced)
ONE contract type: `#[core::contracts::requires]`/`#[ensures]` (Rust nightly,
#128044). Zero cost by default, opt-in `-Z contract-checks=yes` for CI.
No external crates. YAML contracts bridge until nightly stabilizes.
See [provable-contracts.md](provable-contracts.md) for full design.
## New Comply Checks Summary
| CB-145 | Score Regression | Error | Blocks on quality regression > threshold |
| CB-146 | Score Cross-Validation | Warning/Error | Detects sub-score contradictions |
| CB-147 | Composite Score Gate | Error | Blocks if geometric composite < threshold |
| CB-148 | Spec-Work Traceability | Warning | Specs and work items cross-reference |
| CB-150 | Stack Quality Index | Warning/Error | Cross-crate sovereign stack compliance |
| CB-531 | Silent Parameter Discard | Warning | Clap flags accepted but never read |
| CB-534 | Division Safety | Warning | Division where denominator can be zero |
| CB-533 | Stale Path References | Warning | Paths referencing deleted files |
| CB-1201 | PV Lint Gate | Warning/Error | Provable contracts lint pass rate |
| CB-1202 | Contract Coverage | Error | Critical ML/GPU keywords must have contracts |
| CB-1250 | Work-DBC Binding | Warning/Error | Evidence chain on work completion |
## Implementation Priority
| 1 | `pmat score` orchestrator, metric persistence | Foundation — one command, one file |
| 2 | `pmat query` integration (§2a-2d) | Score-aware search, diagnosis, ranking |
| 3 | CI workflow template, pre-push hook | Deep CI integration — single gate |
| 4 | CB-145 (regression), CB-146 (cross-validation) | Trend tracking + contradiction detection |
| 5 | CB-531, CB-534 (leak-driven) | Top defect classes by empirical volume |
| 6 | CB-150 (stack), CB-148 (spec-work), CB-1250 | Cross-repo + evidence chain |
| 7 | CB-1201 (PV Lint), 8th sub-score | Provable contracts in composite |
## Key Files
| `src/cli/handlers/score_handler.rs` | Exists | `pmat score` orchestrator + cross-validation |
| `src/cli/handlers/query_handler/` | Exists (extend) | Add `--score`, `--score-diagnosis`, `--rank-by score` |
| `src/cli/handlers/comply_handlers/check_handlers/check.rs` | Exists | Comply check orchestrator |
| `src/services/rust_project_score/orchestrator.rs` | Exists | RPS 11-category scorer |
| `src/cli/handlers/work_contract_scoring.rs` | Exists | DBC 5-dim per-contract scoring |
| `.pmat-metrics/commit-<sha>-meta.json` | Exists | Composite score persistence |
| `.github/workflows/quality-gate.yml` | Exists | CI template (single gate) |
## References
- RPS v3.0: [repo-health.md](repo-health.md)
- Quality Gates: [quality-gates.md](quality-gates.md)
- Work Management: [work-management.md](work-management.md)
- SWE-CI EvoScore: [swe-ci-evolution.md](swe-ci-evolution.md)
- Code Quality (DBC): [code-quality.md](code-quality.md)