1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
// SPDX-FileCopyrightText: 2026 John Moxley
// SPDX-License-Identifier: MIT OR Apache-2.0
//! Const single-/double-limb hardware divide (and the shift-subtract
//! fallback for the rare const multi-limb case).
//!
//! [`div_rem`] is the `const fn` divide the const-evaluable `wrapping_div`
//! / `wrapping_rem` route through so they can run at compile time. The
//! divisor-shape *choice* between the runtime engines lives in
//! [`crate::int::policy::div_rem`].
use crateMg2By1;
use crate;
/// `quot = num / den`, `rem = num % den`, u64 limbs. `const fn`.
///
/// Hardware fast paths:
/// - both fit a single u64 → one native `u64 / u64`
/// - divisor fits a single u64 → a Möller–Granlund 2-by-1 reciprocal
/// divide per dividend limb (one reciprocal precompute, then
/// mul/shift/correct per limb — see [`single_limb_div_rem`])
/// - otherwise → bit shift-subtract (only reached when divisor is
/// multi-limb; the dispatcher routes those to Knuth instead)
pub const
/// `quot = num / d`, `rem = num % d` for a single non-zero u64 divisor `d`,
/// little-endian u64 limbs. Computes one quotient limb per dividend limb
/// (high → low) via the Möller–Granlund 2-by-1 invariant-divisor reciprocal
/// ([`Mg2By1`]) instead of a per-limb software `u128 ÷ u64`.
///
/// On x86_64 the obvious `acc / (d as u128)` (`acc < d·2^64`, quotient fits
/// a u64) does NOT lower to one hardware `DIV r/m64`; LLVM/compiler-builtins
/// emit a full 128÷128 software routine (`__udivti3`). `const fn` rules out
/// inline `asm!` to reach the hardware instruction, so this keeps the divide
/// const-evaluable by replacing the per-limb division with a reciprocal
/// **multiplication**: precompute `d`'s reciprocal once (amortised over every
/// dividend limb), then each limb is a 64×64→128 multiply, a shift and a small
/// correction — no software `__udivti3`.
///
/// `Mg2By1` requires a *normalised* divisor (top bit set) and a high word
/// strictly below it. So `d` is normalised by `s = d.leading_zeros()` into
/// `dn = d << s`; the dividend is streamed in the matching left-shifted
/// domain (each window word `(num[i] << s) | (num[i-1] >> (64-s))`), the
/// running remainder `r` stays `< dn` (the `Mg2By1` precondition), and the
/// true remainder is recovered as `r >> s`. The quotient is unchanged by the
/// common left shift of dividend and divisor.
///
/// Bit-identical to the prior `u128`-division loop for every `(num, d)`.
const