1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
// SPDX-FileCopyrightText: 2026 John Moxley
// SPDX-License-Identifier: MIT OR Apache-2.0
//! Truncated-low squaring policy — the limb-width (`u64` / `u128`) matcher.
//!
//! [`BigInt::wrapping_sqr_low_u128`] computes `(x²) mod 2^(64·N)` — the low `N`
//! limbs of the square, the high half never formed — via the ONE generic
//! kernel [`sqr_low_limb`]`<N, L: Limb>`. As with the multiply sibling
//! [`crate::int::policy::mul_low`], there is a single algorithm (the
//! truncated-low symmetric square); what this policy owns is the **second
//! matcher axis** (`docs/ARCHITECTURE.md` → "Limb width — the matcher's second
//! axis"): the [`LimbSize`] the kernel runs in.
//!
//! `u128` limbs halve the limb count (≈¼ the partial products at the cost of a
//! wider 128×128 inner step) and the square keeps its symmetry halving in
//! either width, so `u128` wins on the **wide even** work widths the wide-tier
//! exp/powf Smith squaring runs on. Which cells win is a per-`N` property
//! settled by microbench (`benches/micro/sqr_low_u128_ab.rs`) and recorded in
//! [`limb_size`] as policy DATA — NOT a blanket rule and NOT a kernel literal.
//! `u128` is gated to **even `N`** by [`LimbSize::for_packing`] (packing pairs
//! two `u64` per `u128`; an odd `N` would drop the top limb), so every entry
//! stays even-`N`-correct.
//!
//! [`BigInt::wrapping_sqr_low_u128`]: crate::int::types::traits::BigInt::wrapping_sqr_low_u128
//! [`sqr_low_limb`]: crate::int::algos::sqr::sqr_low_limb::sqr_low_limb
//! [`LimbSize`]: crate::int::types::compute_limbs::LimbSize
use cratesqr_low_limb;
use crateLimbSize;
// ── 1. the algorithm — singleton: truncated-low symmetric square ───────
/// The truncated-low squaring algorithm. A singleton: there is one algorithm
/// (the truncated-low symmetric square, [`sqr_low_limb`] — the variant is the
/// CamelCase of the kernel fn minus the `sqr_` prefix).
///
/// The [`LimbSize`] axis is the algorithm's OWN second-stage choice
/// ([`Algorithm::limb_size`]), selected *after* the algorithm and *by* it —
/// the u64/u128 crossover is algorithm-dependent, so it is co-located with the
/// algorithm, not the verdict.
///
/// [`sqr_low_limb`]: crate::int::algos::sqr::sqr_low_limb::sqr_low_limb
// ── 2. the verdict — the algorithm (limb width is the algorithm's own) ─
/// A settled algorithm. The canonical verdict shape: one algorithm at every
/// `N`, so it is always `ByAlgorithm`. The limb width is NOT carried here — it
/// is the chosen algorithm's own [`Algorithm::limb_size`], derived in
/// [`dispatch`].
// ── 3. the matcher ────────────────────────────────────────────────────
/// Pick the algorithm for the truncated-low square. One algorithm at every
/// width, so this is width-independent; the chosen algorithm's own
/// [`Algorithm::limb_size`] carries the only `N`-dependent decision.
const
/// Resolve the full verdict: the algorithm plus its own limb width for this
/// `N`. A named `const fn` (rather than statements inline in `dispatch`'s
/// `const { … }` block) because under `generic_const_exprs` (the nightly
/// `cross-scale-ops` / `exact-scratch-nightly` builds) a generic anonymous
/// constant only admits expression trees — a single call like this folds;
/// a statement block does not.
const
// ── 4. the dispatcher: resolve the algorithm, then its limb width ─────
/// Truncated-low square `out = (x²) mod 2^(64·N)` — the single site
/// [`BigInt::wrapping_sqr_low_u128`] flows through. Two-stage verdict: the
/// algorithm is resolved first, then asked for its own benched limb width
/// ([`Algorithm::limb_size`]). Both are const here, so the `const { … }` block
/// folds them and this compiles to one direct `sqr_low_limb::<N, _>` call per
/// monomorphisation with the unchosen arm dead-arm eliminated. `out` is written
/// in full; bit-identical to [`BigInt::wrapping_mul`]`(x, x)` mod `2^(64·N)` at
/// either width.
///
/// [`BigInt::wrapping_sqr_low_u128`]: crate::int::types::traits::BigInt::wrapping_sqr_low_u128
/// [`BigInt::wrapping_mul`]: crate::int::types::traits::BigInt::wrapping_mul
pub