1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
// SPDX-FileCopyrightText: 2026 John Moxley
// SPDX-License-Identifier: MIT OR Apache-2.0
//! Seed leaves — initial Newton estimates for the integer roots (and,
//! later, the decimal `sqrt_fixed` work-width path).
//!
//! A single support-library file (the leaf analogue of
//! `int/algos/limbs.rs`): all the seed leaves plus their shared
//! bit-extraction helper live here together, so the std/no_std scaffolding
//! and `extract_top_u64` are written once.
//!
//! Each seed leaf exposes ONE API whose `std`/`no_std` behaviour is
//! cfg-swapped *internally*, so callers (the isqrt/icbrt Newton kernels)
//! stay agnostic:
//!
//! - under `std`, the seed comes from the **inherent** `f64` intrinsic
//! (`(top as f64).sqrt()` / `.cbrt()`) of the top 64 significant bits,
//! scaled back to the operand's magnitude — ~53 correct bits in one shot;
//! - under `no_std`, the seed is the classical **pure-integer** one-bit
//! estimate (`2^⌈bits/2⌉` for sqrt, `2^⌈bits/3⌉` for cbrt) — built only
//! from Rust primitives, never `libm` / `num_traits::Float`.
//!
//! Both seeds are safe over-estimates: the downward-monotone Newton loop in
//! the kernel self-corrects to the exact floor root regardless of which
//! seed body ran, so the *result* is identical on `std` and `no_std`; only
//! the iteration count differs.
//!
//! All leaves are width-agnostic — a `&[u64]` / `u32` interface — so the
//! decimal-side consumers can reuse them unchanged.
//!
//! # Over-estimate guarantee (the load-bearing invariant — PROOF)
//!
//! Every `sqrt_seed*` leaf returns a value `≥ √n` (the *real* square root,
//! hence `≥ floor(√n)`). This is what makes the caller's downward-monotone
//! Newton recurrence `x ← (x + n/x)/2` valid: from an over-estimate it
//! decreases monotonically and lands on `floor(√n)` or `floor(√n)+1`, which a
//! single final `if x > n/x { x -= 1 }` correction pins exactly. From an
//! UNDER-estimate the recurrence steps *up* on the first iteration, the
//! "stop when it stops decreasing" guard fires immediately, and the routine
//! returns the under-estimate — a WRONG floor. So the over-estimate property
//! is correctness-critical, not a mere performance hint.
//!
//! **Decomposition.** For `n` with bit-length `bits`, write
//! `shift = bits − min(64, bits)` and `top = ⌊n / 2^shift⌋` — the top `≤64`
//! significant bits (leading 1 at bit 63 when `bits ≥ 64`). Then
//! `n = top·2^shift + r` with `0 ≤ r < 2^shift`. The `std` body forms a strict
//! **fixed-point** over-estimate `seed_int` of `√top · 2^F` (of `√(2·top) · 2^F`
//! when `shift` is odd), with `F = min(h, SEED_FRAC_BITS)` fractional bits and
//! `h = ⌊shift/2⌋`, via `⌊f64::sqrt(top) · 2^F⌋ + [frac≠0] + 1` — note the
//! **always-`+1`** — and returns `seed_int · 2^{h−F}`. (`F = 0` recovers the
//! plain integer form; the fractional bits exist so a big `2^h` placement does
//! not balloon the integer-quantisation margin to a relative `2^{-31}` — with
//! them the seed is accurate to a relative `~2^{-49}`, which is what keeps the
//! consumers' Newton divide count at ~1 for roots up to ~98 bits.)
//!
//! **Even `shift = 2h`.** `√n ≤ √((top+1)·2^{2h}) = √(top+1)·2^h`, so it
//! suffices that `seed_int ≥ √(top+1)·2^F`. The body gives
//! `seed_int ≥ √top·2^F − d + 1` where `d` is the f64 deficit (next paragraph,
//! `d < 0.4`), and the gap to cover is
//! `(√(top+1) − √top)·2^F = 2^F/(√(top+1)+√top) ≤ 2^{F−32.5} ≤ 2^{−15.5}`
//! (`top ≥ 2^{63}` whenever `shift ≥ 1`; for `shift = 0` the window is exact,
//! `F = h = 0`, and the original integer-lemma `(√t+1)² ≥ t+1` applies).
//! Since `1 − d > 0.6 > 2^{−15.5}`, `seed = seed_int·2^{h−F} ≥ √n`. ∎
//! The always-`+1` is **essential**: it closes the perfect-square-`top` case
//! (`⌊√top⌋ = √top` exactly), where the low bits `r` push `√n` strictly above
//! `√top·2^h`. Without it the seed under-estimates there — a failure of
//! density `2^{-32}` that uniform-random tests practically never hit.
//!
//! **Odd `shift = 2h+1`.** `√n ≤ √(2(top+1))·2^h`; the body over-estimates
//! `√2·√top·2^F = √(2·top)·2^F`, the gap `(√(2(top+1))−√(2·top))·2^F ≤ 2^{−15}`
//! and the deficit bound below still leave the `+1` dominant; hence
//! `seed ≥ √n`. ∎
//!
//! **f64 rounding (the deficit `d`).** `top ≤ 2^64` rounds to the nearest f64
//! (relative `≤ 2^{−53}`), `f64::sqrt` is correctly rounded (`≤ 2^{−53}`), the
//! odd-shift `SQRT_2` constant + multiply add `≤ 2·2^{−53}`, and the `· 2^F`
//! scale is EXACT (power of two). Total relative deficit `≤ ~2^{−51}` on a
//! magnitude `√(2·top)·2^F ≤ 2^{32.5+F} ≤ 2^{49.5}` (with
//! `SEED_FRAC_BITS = 17`), i.e. `d < 0.4` absolute — dominated by the
//! always-`+1`. This bound is why `SEED_FRAC_BITS` must stay `≤ 17`: at higher
//! `F` the deficit approaches/exceeds 1 and the `+1` no longer covers it.
//!
//! **no_std / tiny-`n` path.** Returns `2^⌈bits/2⌉`; since `n < 2^bits`,
//! `√n < 2^{bits/2} ≤ 2^⌈bits/2⌉`. ∎
//!
//! **Consequence for wider-than-u128 consumers.** The proof holds for
//! ARBITRARY `bits` via the `top`/`shift` decomposition — it is not special to
//! `n` that fit a `u128`. So the `&[u64]` slice leaf [`sqrt_seed`] is a valid
//! over-estimate seed for a radicand of ANY width: a width-`N` isqrt kernel
//! (e.g. the `u256` hypot root) MUST seed through this leaf rather than
//! re-deriving the `· 2^h` scaling by hand. A hand-rolled re-derivation that
//! drops the always-`+1` silently reintroduces the perfect-square-`top` bug
//! that random tests cannot detect — reuse the proven leaf.
//!
//! Hasselgren's trick / seed strategy — Crandall & Pomerance 2005, "Prime
//! Numbers: A Computational Perspective" §9.2.1.
/// Fractional fixed-point bits the `std` sqrt seed bodies keep below the
/// integer part of the unscaled window root (capped by the placement shift
/// `h`). The cap is part of the over-estimate PROOF (module doc, "f64
/// rounding"): at `F ≤ 17` the f64 deficit stays `< 0.4` absolute so the
/// always-`+1` margin still strictly covers it — do NOT raise this without
/// re-deriving that bound.
const SEED_FRAC_BITS: u32 = 17;
/// Extract the top (most significant) 64 significant bits of the
/// little-endian `u64` magnitude `n`, given its bit length `bits` (the seed
/// callers guard `bits >= 2` before calling).
///
/// Returns `(top, shift)` where `shift = bits - min(64, bits)` is the bit
/// position of the start of the extracted window (`n ≈ top << shift`, exact
/// when `bits <= 64`), and `top` holds those bits right-aligned into a
/// `u64` (leading 1 at bit 63 when `bits >= 64`). The 64-bit window fits
/// the f64 mantissa with 11 bits of headroom, which is what makes the `f64`
/// seed bootstrap exact to ~53 bits. Pure bit manipulation — primitives
/// only.
pub const
/// Write a safe over-estimate seed for `floor(sqrt(n))` into `out`.
///
/// `n` is the little-endian `u64` magnitude, `bits = bit_len(n)` (caller
/// guarantees `bits >= 2`), and `out` is the **zeroed** work slice; the
/// seed is OR-ed in, then a minimal non-zero fallback is forced if the
/// placement landed entirely outside the slice — preserving the Newton
/// kernel's pre-loop invariant `x >= 1`.
///
/// `std`: hardware `f64::sqrt` bootstrap (~53 correct bits) with the
/// odd-shift `SQRT_2` correction and round-up. `no_std`: classical
/// pure-integer 1-bit seed `2^ceil(bits/2)`. Both are over-estimates, so
/// the downward-monotone Newton loop converges to the identical floor root.
///
/// Hasselgren's trick — see Crandall & Pomerance 2005, "Prime Numbers: A
/// Computational Perspective" §9.2.1.
pub
/// Over-estimate seed for `floor(sqrt(n))` of a single `u128` scalar `n >= 1`
/// (`bits = 128 - n.leading_zeros()`). The SCALAR sibling of [`sqrt_seed`] for
/// the hot u128 isqrt call sites (hypot's `isqrt_u128`, …) that work in one
/// `u128` rather than a limb slice.
///
/// Same feature-based bootstrap and convergence guarantee as [`sqrt_seed`]:
/// under `std` the hardware `f64::sqrt` of the top 64 significant bits with
/// the odd-shift `SQRT_2` correction + round-up (~53 correct bits in one shot
/// → the downward Newton loop converges in ~1-2 iterations); under `no_std`
/// the classical pure-integer 1-bit `2^ceil(bits/2)`. BOTH are guaranteed
/// over-estimates, so the caller's downward-monotone Newton recurrence never
/// under-runs into a linear floor-walk and lands on the identical floor root
/// on either build. Returns a value `>= floor(sqrt(n))`.
pub
/// Write a safe over-estimate seed for `floor(cbrt(n))` into `out`.
///
/// Same contract and convergence guarantee as [`sqrt_seed`].
///
/// `std`: hardware `f64::cbrt` bootstrap. `cbrt(top · 2^shift) =
/// cbrt(top) · 2^(shift/3)`; the `shift % 3` residue is folded into the
/// **exact** fractional `2^(r/3)` multiplier — `2^(1/3) ≈ 1.2599` for
/// `r=1`, `2^(2/3) ≈ 1.5874` for `r=2` — then rounded up to a strict
/// over-estimate. (The earlier coarse `2^r` integer multiplier — `×2` and
/// `×4` — over-shot the true cbrt by up to ~2.52× and cost the Newton loop
/// several extra `O(W²)` divides per call; the exact fractional multiplier
/// keeps the seed within ~`1 + 2⁻⁵²` of `∛n`, which is the whole tightness
/// win.) `no_std`: classical pure-integer 1-bit seed `2^ceil(bits/3)` (the
/// root has at most `ceil(bits/3)` bits, so this over-estimates). The
/// Brent–Zimmermann downward Newton iteration self-corrects to the floor
/// cube root from any positive over-estimate, so the convergence guarantee
/// is unchanged — only the iteration *count* improves.
///
/// Hasselgren seed strategy — Crandall & Pomerance 2005 §9.2.1.
pub
/// Full-radicand `f64` cube-root seed: `value.cbrt()`, where `value` is the
/// radicand's `f64` approximation (`Int::<W>::as_f64()`). This is the tight
/// seed — the cube root of the WHOLE radicand, not the top-bits
/// window [`cbrt_seed`] extracts — correctly rounded to ~53 bits, so it sits
/// within ~`2⁻⁵²` *relative* of `∛n`. The caller bridges the result back to
/// `Int<W>` (`from_f64`) and lifts it above `⌈∛n⌉` with one Newton pre-step
/// before the downward-monotone loop, so the floor cube root is reached
/// identically regardless of the seed's tightness.
///
/// **Std-only — no `no_std` arm.** Unlike [`cbrt_seed`], the full-radicand
/// recipe has no pure-integer analogue: it is the inherent `f64::cbrt`
/// intrinsic applied to the whole magnitude. A `no_std` cube-root kernel
/// seeds through [`cbrt_seed`] (the top-bits window) instead, so this leaf is
/// gated to `std` exactly like its sole caller
/// (`cbrt_native_fast::icbrt_w_f64_full`). The caller MUST guarantee `value`
/// is finite and in range (`as_f64` would otherwise saturate to `±inf`).
pub
/// Place `seed_int << half_shift` into `out` (OR-ed), then guarantee a
/// non-zero seed. Pure primitive limb writes — part of the leaf, shared by
/// both `std` seed bodies.