1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
// SHIP-TWO-001 AC-SHIP1-007 / FALSIFY-SHIP-007 algorithm-level PARTIAL discharge.
//
// Spec: docs/specifications/aprender-train/ship-two-models-spec.md
// Contract: contracts/apr-cli-commands-v1.yaml (GATE-BENCH-SHIP-007 — to be
// wired in the same PR as this file lands) and the AC row itself at
// `AC-SHIP1-007 | apr bench decode throughput ≥30 tok/s on RTX 4090 (7B Q4_K target)`.
//
// AC-SHIP1-007 states that the MODEL-1 teacher
// (`paiml/qwen2.5-coder-7b-apache-q4k-v1`) must sustain a decode median of
// at least 30 tok/s on an RTX 4090 at the 7B Q4_K quantization. That is a
// ship-blocking performance floor for the teacher artifact — below 30 the
// artifact cannot be declared Ollama-parity-class for 7B Q4_K.
//
// This file discharges the *decision rule* at `PARTIAL_ALGORITHM_LEVEL`:
// given a measured decode tok/s, the verdict is `Pass` iff it is finite
// AND at or above the contract floor (30.0 tok/s). The compute-heavy
// portion of the AC (running `apr bench --iterations 5 --max-tokens 128`
// on live teacher weights on an RTX 4090 host) is intentionally out of
// scope here; the threshold rule is what `apr bench` must emit a Pass on,
// and changing either side of the bind (the 30.0 constant, or the `finite
// AND ≥ floor` shape) breaks this test before any bench run is launched.
//
// Mirrors the MODEL-2 pattern set by SHIP-020 (task #150 on branch
// `feat/falsify-ship-020-partial-discharge`, PR #1005 pending merge).
// SHIP-007 is the MODEL-1 twin: identical f32-threshold verdict shape,
// different floor constant (100.0 → 30.0 tok/s — 7B Q4_K is bandwidth-
// bound at ~3.5× the size of the 370M target). Authored self-contained
// because SHIP-020 is not yet on main; once it lands the two
// `verdict_from_decode_tps_*` fns should be deduplicated into a single
// parameterized helper `verdict_from_decode_tps(measured, floor)`.
//
// MODEL-1 is now at 4/10 AC-SHIP1 items touched (SHIP-008 + SHIP-009 +
// SHIP-006 + SHIP-007).
/// Minimum acceptable median decode throughput, in tok/s, for the MODEL-1
/// teacher (`paiml/qwen2.5-coder-7b-apache-q4k-v1`) when measured by
/// `apr bench --iterations 5 --max-tokens 128` on an RTX 4090 host.
///
/// Derivation: spec AC-SHIP1-007 binds the 30 tok/s floor to the 7B Q4_K
/// ship criterion. The constant is pinned here so that contract drift in
/// either direction (weakening to 25, hardening to 35 without updating
/// AC) is caught at compile+test time, not at a production publish.
/// Lockstep with `docs/specifications/aprender-train/ship-two-models-spec.md`
/// §4.2 row AC-SHIP1-007.
pub const AC_SHIP1_007_MIN_DECODE_TPS_RTX4090_7B: f32 = 30.0;
/// Binary verdict for FALSIFY-SHIP-007 / GATE-BENCH-SHIP-007.
/// `Pass` iff the measured decode throughput is finite AND at or above
/// [`AC_SHIP1_007_MIN_DECODE_TPS_RTX4090_7B`]. `Fail` otherwise (including
/// every non-finite value: NaN, +∞, -∞).
/// Algorithm-level verdict rule for FALSIFY-SHIP-007 / GATE-BENCH-SHIP-007
/// / AC-SHIP1-007: a single f32 threshold check against the MODEL-1 7B
/// Q4_K decode floor. Returns [`Ship007Verdict::Fail`] conservatively for
/// NaN, +∞, and -∞ so that a telemetry or JSON-parse bug can never be
/// silently promoted to a Pass. The full discharge (live `apr bench
/// --iterations 5 --max-tokens 128 paiml/qwen2.5-coder-7b-apache-q4k-v1`
/// on RTX 4090 with median ≥ 30.0) remains blocked on hardware evidence
/// collection.
// ─────────────────────────────────────────────────────────────
// Unit tests — FALSIFY-SHIP-007 algorithm-level proof
// ─────────────────────────────────────────────────────────────