1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
// SHIP-TWO-001 AC-SHIP1-002 / FALSIFY-SHIP-002 algorithm-level PARTIAL discharge.
//
// Spec: docs/specifications/aprender-train/ship-two-models-spec.md
// Contract: contracts/qwen2-e2e-verification-v1.yaml (GATE-QW2E-SHIP-002 —
// wired in the same PR as this file lands).
//
// AC-SHIP1-002 states that the MODEL-1 teacher
// (`paiml/qwen2.5-coder-7b-apache-q4k-v1`) must emit syntactically
// valid Python on the canonical prompt `def fib(n):` via
// `apr run <model>.safetensors`. The falsification test
// FALSIFY-SHIP-002 parses the emitted completion with
// `rustpython`/`ruff` and flags any parse error as a ship-blocker.
//
// This file discharges the *decision rule* at `PARTIAL_ALGORITHM_LEVEL`:
// given a count of syntax errors observed on the canonical prompt, the
// verdict is `Pass` iff `syntax_errors ≤
// AC_SHIP1_002_MAX_TOLERATED_SYNTAX_ERRORS` (= 0). Because the spec
// text is strict — "emits valid Python" with no tolerance allowance —
// any non-zero error count on the single canonical prompt is a Fail.
// The compute-heavy portion of the AC (actually running the teacher
// and parsing its output) is intentionally out of scope here.
//
// Mirrors the MODEL-2 pattern set by SHIP-017 (GATE-ARCH-370M-005 in
// `crates/aprender-train/src/models/llama_370m.rs`), which also binds
// AC-SHIP2-007 to a `verdict_from_syntax_error_count` const fn. SHIP-017
// tolerates ≤ 1 error across 100 held-out prompts; SHIP-002 is the
// MODEL-1 twin with a tighter rule (0 errors on the single canonical
// prompt) because the 7B teacher should be essentially flawless on
// the canonical `def fib(n):` completion. Authored self-contained
// because SHIP-017 PR #1004 is not yet on main; once it lands, the
// two `verdict_from_syntax_error_count_*` fns should be deduplicated
// into a single parameterized helper.
//
// MODEL-1 is now at 6/10 AC-SHIP1 items touched (SHIP-008 + SHIP-009
// + SHIP-006 + SHIP-007 + SHIP-005 + SHIP-002).
/// Spec-authorized tolerance for syntax errors on the canonical
/// AC-SHIP1-002 prompt `def fib(n):`. The spec text — "emits valid
/// Python" — carries no noise allowance, so a single syntax error
/// is a ship-blocker. Holding this as a const locks the threshold
/// at compile time and makes any silent widening (e.g. to 1) a
/// test-breaking edit.
pub const AC_SHIP1_002_MAX_TOLERATED_SYNTAX_ERRORS: usize = 0;
/// Binary verdict for FALSIFY-SHIP-002 / GATE-QW2E-SHIP-002.
/// `Pass` iff the observed syntax-error count is at or below the
/// spec tolerance (0). `Fail` otherwise.
/// Algorithm-level verdict rule for FALSIFY-SHIP-002 / GATE-QW2E-SHIP-002
/// / AC-SHIP1-002: the teacher must emit syntactically valid Python on
/// the canonical `def fib(n):` prompt. The input is an integer count
/// of syntax errors produced by the downstream Python AST parse; this
/// function is purely the threshold arbiter.
///
/// Declared `const fn` so the decision rule is evaluable at compile
/// time, matching MODEL-2 SHIP-017's shape exactly (modulo the
/// different tolerance constant).
//
// clippy::absurd_extreme_comparisons fires because
// AC_SHIP1_002_MAX_TOLERATED_SYNTAX_ERRORS = 0 makes `<= 0` semantically
// equivalent to `== 0` on an unsigned type. We keep the `<=` shape
// intentionally: it mirrors MODEL-2 SHIP-017's `verdict_from_syntax_error_count`
// (tolerance = 1, where `<=` is non-vacuous) so the two can be
// deduplicated into a single parameterized helper once both PRs land.
pub const
// ─────────────────────────────────────────────────────────────
// Unit tests — FALSIFY-SHIP-002 algorithm-level proof
// ─────────────────────────────────────────────────────────────