1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
// SPDX-FileCopyrightText: 2026 John Moxley
// SPDX-License-Identifier: MIT OR Apache-2.0
//! Multiply policy — the schoolbook-vs-Karatsuba algorithm matcher.
//!
//! Like division, the integer multiply choice keys on the **runtime
//! length** of the operands, not the const limb count `N`, so it is a
//! [`Select::ByValue`] case in the canonical policy shape (see
//! `docs/ARCHITECTURE.md` → "Policy file structure"): the const layer
//! settles on "the value decides", the value-matcher classifies the
//! operand lengths and returns an [`Algorithm`] tag, and the dispatcher
//! does an **exhaustive** `match algo` to a pure kernel in
//! [`crate::int::algos::support::limbs`].
//!
//! **One classifier, two doors** (`docs/ARCHITECTURE.md` → "Const entry +
//! slice entry"). The length classifier ([`select`]'s `ByShape`) backs TWO
//! entry points over the SAME decision:
//! - [`dispatch`] — the const-`N` door for `Int<N>×Int<N>` callers (the
//! wide-transcendental work-muls): the lengths are both `N`, so the
//! classifier folds to a const verdict per monomorphisation and additionally
//! takes the [`LimbSize`] (`u64`/`u128`) axis. The hot path.
//! - [`dispatch_slice`] — the runtime-length door for genuine slice callers
//! (the decimal slice roots and the rescale product path) that hold bare
//! `&[u64]` of runtime length and no `N`. It runs the IDENTICAL classifier
//! on `a.len()`/`b.len()` and routes to the `u64` slice kernels — no const
//! `N` means no `u128` packing, but the product is bit-identical.
//!
//! The kernels ([`mul_schoolbook`] / [`mul_karatsuba`]) stay pure; this
//! file owns the *choice* — the benched crossover ([`KARATSUBA_ENGAGE`]) and
//! recursion depth ([`KARATSUBA_RECURSE`]) are policy DATA here, not magic
//! numbers in a kernel.
use crate;
use crate;
use crate;
// ── 1. the real multiply algorithms — NAMED, no `Default` ─────────────
/// The multiply algorithms the length matcher chooses between. Variants
/// are the CamelCase of each kernel fn's name minus the `mul_` function
/// prefix (`mul_full_limb` → `Schoolbook`, `mul_karatsuba` → `Karatsuba`).
// ── 2. the verdict ────────────────────────────────────────────────────
/// A settled algorithm, or "the (runtime) length decides". `ByShape`
/// classifies the operand lengths (known at run time) → the algorithm;
/// `ByAlgorithm` is part of the canonical shape for uniformity.
// ── policy data: the benched crossover threshold ──────────────────────
/// Karatsuba **engage** point: the (equal) operand limb-count at or above which
/// [`dispatch`] routes EVEN-width products to the Limb-generic Karatsuba kernel
/// (`mul_karatsuba_limb::<N, u128>`) instead of the u128 fixed-width schoolbook.
/// File-private policy data.
///
/// **`128`** — the policy-map (`mul_toom3_ab`, every fixed-array candidate ×
/// u64/u128 raced 24..256, pinned) plus the `mul_kara_thresh_ab` recursion-depth
/// sweep localize the crossover to `(96, 128]`: schoolbook-u128 wins `N <= 96`,
/// the u128-packed recursive Karatsuba wins `N >= 128` by **1.34x at N=128 and
/// 1.39x at N=256**. Only EVEN `N` reaches Karatsuba (so it always
/// packs to `u128`); odd / `< 128` widths stay schoolbook. The exact crossover
/// in `(96, 128]` is academic — no shipped storage tier (<=64) or work width
/// (96/128/192/256) lies strictly between 96 and 128.
const KARATSUBA_ENGAGE: usize = 128;
/// Karatsuba **recursion** base: the limb-count below which the kernel stops
/// splitting and runs schoolbook. **`48`** is the swept optimum (`kara_t48`
/// beat `t16/t24/t32` at N=128 and decisively at N=256). Distinct from
/// [`KARATSUBA_ENGAGE`] (when to USE Karatsuba) — this is how DEEP it recurses.
/// The kernel requires `>= 4` (the z1 sum product on `⌈n/2⌉ + 1` limbs only
/// strictly shrinks below `n` once `n >= 4`); 48 satisfies that.
const KARATSUBA_RECURSE: usize = 48;
// ── 3. the matcher: keyed on the runtime operand lengths ──────────────
/// Pick the multiply algorithm for the operands' lengths. Equal-length EVEN
/// operands at or above [`KARATSUBA_ENGAGE`] take Karatsuba; everything else
/// (unequal, odd, or below the engage point) takes the fixed-width schoolbook.
const
// ── 4. the dispatcher: classify lengths, resolve limb width, dispatch ─
/// Equal-length `Int<N>×Int<N>` full-product dispatcher — the single site
/// every `widen_mul` wide multiply flows through. Resolves the algorithm
/// (Karatsuba at/above the threshold, else schoolbook), then for schoolbook
/// asks the chosen algorithm for its benched limb width
/// ([`Algorithm::limb_size`]) and runs the ONE generic [`mul_full_limb`]
/// kernel at `u64` / `u128`. Both stages are const here, so the `const { … }`
/// block folds them to one direct call per monomorphisation with the unchosen
/// arms dead-arm eliminated.
///
/// `out` must be sized `>= 2·N`. Every arm writes `out` in full (the kernels
/// zero their own accumulators); the result is bit-identical at either limb
/// width and against the historic slice schoolbook.
pub
/// Runtime-length **slice door** over the SAME [`select`] length classifier as
/// the const [`dispatch`] (`docs/ARCHITECTURE.md` → "Const entry + slice
/// entry — one length/shape classifier, two doors"). For the genuine slice
/// callers — the decimal slice roots (`sqrt_newton`, `cbrt_newton`) and the
/// rescale product path (`div_widen_scale`) — whose operands are bare `&[u64]`
/// of runtime length with no `N` in their types, so they cannot take the const
/// door. They route here instead of reaching past the matcher to a hardcoded
/// kernel (the Class-G bypass this door removes).
///
/// The classifier is run on the runtime `a.len()`/`b.len()`: equal-length EVEN
/// operands at or above [`KARATSUBA_ENGAGE`] take the recursive Karatsuba
/// ([`mul_karatsuba`], recursing to schoolbook at [`KARATSUBA_RECURSE`]);
/// everything else (unequal, odd, or below the engage point) takes the slice
/// schoolbook ([`mul_schoolbook`]). The product is **bit-identical** to a plain
/// `mul_schoolbook` call on the same operands at every shape.
///
/// # Limb width
///
/// The slice door runs the **`u64`** kernels only. The [`LimbSize`] (`u128`-
/// packing) axis the const door takes needs a compile-time `N` to size the
/// packed `[L; N]` / `ComputeLimbs` buffers, which a runtime-length slice does
/// not have — so it stays `u64`. That axis is a const-door-only optimisation;
/// the result is the same integer either way.
///
/// # Caller contract
///
/// `out` must be **zeroed by the caller** and sized `>= a.len() + b.len()`
/// (exactly the existing [`mul_schoolbook`] contract — every converted caller
/// already satisfies it). No scratch parameter: the Karatsuba slice entry
/// self-sizes its own (sanctioned width-erased build-max) scratch internally.
/// The Karatsuba arm additionally needs `a.len() == b.len()`, which the
/// classifier guarantees before it is reached (it only engages Karatsuba for
/// equal even lengths `>= KARATSUBA_ENGAGE`).
pub