1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
// SPDX-FileCopyrightText: 2026 John Moxley
// SPDX-License-Identifier: MIT OR Apache-2.0
//! Integer squaring policy -- the half-product algorithm matcher.
//!
//! `Uint<N>::sqr` / `Uint<N>::wrapping_sqr` and the `Int<N>` siblings
//! delegate to [`dispatch`], which follows the canonical policy shape (see
//! `docs/ARCHITECTURE.md` -> "Policy file structure"):
//!
//! 1. an [`Algorithm`] enum -- the real squaring algorithm(s), no `Default`
//! variant;
//! 2. a [`Select`] verdict -- a settled algorithm or "the value decides";
//! 3. a `const fn` [`select`] keyed on `N`, total over the key;
//! 4. dispatch via an inline `const { select::<N>() }` block, then an
//! **exhaustive** `match algo` -- no `_`, no panic.
//!
//! Because `select` is `const` and keyed only on the const generic `N`,
//! the `const { ... }` block folds per monomorphisation and the unchosen arm
//! is dead-arm-eliminated in release: each concrete `Uint<N>` compiles to a
//! direct call to the half-product squaring kernel, no runtime branch.
//!
//! # Algorithm
//!
//! Two const squaring algorithms, both bit-identical (each computes `x^2`
//! modulo `2^BITS` exactly), selected per `N` by a **benched crossover band**:
//!
//! - [`crate::int::algos::sqr::sqr_half_product::sqr_half_product`] computes
//! `x^2` via the const comba half-product kernel
//! [`crate::int::algos::sqr::sqr_low_fixed::sqr_low_fixed`]: it exploits
//! symmetry to form each cross term once and double it, ≈`N^2/4`
//! limb-multiplies. It is the default everywhere outside the band.
//! - [`crate::int::algos::sqr::sqr_schoolbook::sqr_schoolbook`] computes the
//! full `x*x` truncated to the low `N` limbs via the unrolled fixed-width
//! [`crate::int::algos::mul::mul_schoolbook::mul_low_fixed`] kernel
//! (≈`N^2/2` limb-multiplies, no symmetry). Despite the higher multiply
//! count it WINS in the mid-width band: the comba's variable per-column
//! inner-loop bound defeats the unroller exactly where `mul_low_fixed`
//! still unrolls cleanly.
//!
//! The `sqr_full_ab` A/B (`half_product` vs `schoolbook` across `N = 2..64`,
//! both const, the only candidates this `const fn` dispatch can route to;
//! three independent core-pinned runs) localizes the crossover:
//!
//! | N | winner | margin |
//! |-------------|--------------|-------------------|
//! | 2, 3, 4, 6 | ~tie | <= 1.10x (noise) |
//! | 8, 10, 12 | schoolbook | 1.15 .. 1.38x |
//! | 14, 16, .. | half_product | 1.46 .. 1.98x |
//!
//! So the `Schoolbook` band is [`SCHOOLBOOK_LO`]`..=`[`SCHOOLBOOK_HI`]
//! (`8..=13`, placing the upper edge between the N=12 schoolbook win and the
//! N=14 half_product win). The sub-N=8 ties stay on `HalfProduct` — the
//! algorithmically-fewer-multiplies default, no flip on noise. N=8 (D153)
//! and N=12 (D230) are real storage tiers, so the band recovers a real win
//! on the square-and-multiply (`pow`/`cube`) and transcendental paths.
//!
//! The layering points DOWN — each algorithm calls the kernel, never a
//! squaring method on `Uint<N>`.
//!
//! ## What this policy does NOT route to (the const wall)
//!
//! [`dispatch`] is `const fn` (`Uint<N>::wrapping_sqr` is `const fn` and feeds
//! `pow`/`cube`/const contexts), so it can only choose between **const**
//! kernels. The u128-packed truncated-low square
//! [`crate::int::algos::sqr::sqr_low_limb::sqr_low_limb`] is NOT `const fn`
//! (the [`crate::int::types::compute_limbs::Limb`] trait methods are not const),
//! so the `LimbSize` (u64 / u128) axis is INELIGIBLE here. That axis — where
//! the `sqr_full_ab` map shows `u128` overtaking both const arms at `N >= 32`
//! — is owned by the separate **non-const** policy
//! [`crate::int::policy::sqr_low`] (benched via `sqr_low_u128_ab`), which
//! drives `BigInt::wrapping_sqr_low_u128`. This file deliberately does not
//! duplicate that axis.
//!
//! The `ByValue` arm of [`Select`] is present for canonical-shape
//! uniformity; `select` never returns it.
//!
//! # Const-ness
//!
//! `dispatch` IS `const fn`: the algorithm fn computes via the const
//! [`crate::int::algos::sqr::sqr_low_fixed::sqr_low_fixed`] kernel, so the type's
//! `const fn` `wrapping_sqr` can delegate through it. The `ByValue` arm
//! returns the default algorithm tag without invoking the fn pointer
//! (calling a fn pointer is not permitted in `const fn`; merely matching
//! the variant is fine).
use cratesqr_half_product;
use cratesqr_schoolbook;
use crateUint;
// -- 1. the real squaring algorithms -- NAMED, no `Default` ---------------
/// The squaring algorithms this policy chooses between. Variants are the
/// CamelCase of each kernel fn's name minus the `sqr_` function prefix --
/// strict 1:1 with the kernel fns.
// -- 2. the verdict --------------------------------------------------------
/// A settled algorithm, or "the value decides". The sqr picker always
/// returns `ByAlgorithm`: the choice is fully determined by `N` (constant per
/// monomorphisation) via the benched crossover band in [`select`]. `ByValue`
/// is part of the canonical shape for uniformity across functions; `select`
/// never returns it.
// -- policy data: the benched crossover band ------------------------------
/// Inclusive lower edge of the [`Algorithm::Schoolbook`] band. Below this the
/// `sqr_full_ab` map is a statistical tie (`N = 2/3/4/6`, <= 1.10x both
/// directions across runs), so those `N` stay on the fewer-multiplies
/// [`Algorithm::HalfProduct`] default. File-private policy DATA.
const SCHOOLBOOK_LO: usize = 8;
/// Inclusive upper edge of the [`Algorithm::Schoolbook`] band. The const A/B
/// shows `schoolbook` winning at `N = 8/10/12` and `half_product` winning at
/// `N >= 14`, so the band ends at `13` (between the N=12 schoolbook win and
/// the N=14 half_product win). File-private policy DATA.
const SCHOOLBOOK_HI: usize = 13;
// -- 3. the matcher: const, keyed on `N`, total over the key --------------
/// Pick the squaring algorithm for storage limb count `N`. Total over the
/// key: the benched mid-width band [`SCHOOLBOOK_LO`]`..=`[`SCHOOLBOOK_HI`]
/// takes the full unrolled [`Algorithm::Schoolbook`] (it beats the comba
/// there despite forming twice the partial products — the comba's variable
/// inner-loop bound defeats the unroller in that band); every other width
/// takes the symmetric [`Algorithm::HalfProduct`].
const
// -- 4. the dispatcher: fold the verdict, then dispatch --------------------
/// Integer squaring dispatcher for `Uint<N>`.
///
/// Resolves the compile-time algorithm verdict via
/// `const { select::<N>() }` (folds per monomorphisation; dead arms are
/// eliminated in release) then dispatches exhaustively over [`Algorithm`].
///
/// Must be `const fn`: `Int<N>::wrapping_sqr` is itself `const fn` and is
/// called from `const` contexts (and from `pow`/`cube`). The `ByValue` arm
/// returns the default algorithm tag without invoking the fn pointer,
/// satisfying the `const fn` constraint.
pub const