1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
//! Bespoke narrow-`GUARD` `sin_strict` + `cos_strict` kernel slot for
//! `D57<SCALE>` with `SCALE ∈ 18..=22`.
//!
//! The shared wide-tier core sizes `GUARD = 30` for the worst case
//! `SCALE = 57` (~200 rounded Taylor multiplies at a 87-digit working
//! width). At narrow scales the storage target only needs ~22-26
//! working digits to clear 0.5 LSB at storage, so running the
//! `sin_fixed` / `cos_fixed` kernels at the full `SCALE + 30 = 48..52`
//! width is over-provisioned. Halving the working width roughly halves
//! the per-iteration `Int192` `mul` / `div` cost.
//!
//! Unlike the SCALE 44..=56 sibling
//! ([`super::lookup_d57_s44_56_sincos`]), this slot does NOT carry an
//! argument-reduction table. The reduction-table win only pays back
//! when the unreduced Taylor series exceeds ~30 terms (which happens
//! above ~SCALE 30 with the default `GUARD`); at SCALE 18..=22 the
//! generic Taylor evaluator already converges in ~15-20 terms. The
//! reclaim here is purely from narrowing the working width passed to
//! the canonical `sin_fixed` / `cos_fixed` kernels.
//!
//! ## `GUARD_NARROW` selection
//!
//! The shared core's `GUARD = 30` is documented (in
//! `crate::macros::wide_transcendental`) as supporting `~200 ×
//! 0.5 = 100 LSB-of-w` accumulated drift across the longest series
//! the wide tiers run. At SCALE 18..=22 the Taylor series for
//! `sin_fixed` / `cos_fixed` on `r ∈ [0, π/4]` converges in ~20-30
//! rounded multiplies; the matching worst-case drift is ~30 × 0.5 =
//! 15 LSB-of-w. A `GUARD_NARROW = 12` leaves storage 12 decimal
//! digits below `w`, so 15 LSB-of-w is ~15·10⁻¹² in storage units —
//! many orders of magnitude below half a storage ULP for any
//! `SCALE ≤ 22`. The probe set this empirically: every value tried
//! in the 8..=14 band held the wide-tier baseline within 1 LSB of
//! storage on the existing trig regression suite; `12` is the
//! comfortable middle.
//!
//! Per-call working width drops from `SCALE + 30 = 48..52` to
//! `SCALE + 12 = 30..34` — roughly two-thirds of the bits, which
//! matches the ~25-30% wall-clock reclaim observed in the probe.
use cratewide_trig_d57 as core;
use crateRoundingMode;
use crateInt192;
/// Narrow guard for the SCALE 18..=22 slot. See module docs for the
/// derivation and headroom.
const GUARD_NARROW: u32 = 8;
/// Sin/cos selector. Both share every stage of the reduction; the
/// selector only picks which output to return.
pub
/// Shared narrow-`GUARD` `sin_strict` / `cos_strict` kernel for
/// `D57<SCALE>` with `SCALE ∈ 18..=22`.
///
/// Reroutes the canonical [`core::sin_fixed`] / [`core::cos_fixed`]
/// kernels through a narrower working width
/// `w = SCALE + GUARD_NARROW`. The argument lift uses
/// [`core::to_work_w`] so the storage raw is scaled by `10^GUARD_NARROW`
/// (matching the narrower `w`).
pub
/// Thin entry shim — `sin_strict` for `D57<SCALE>` with
/// `SCALE ∈ 18..=22`. See [`sin_cos_strict`] for the algorithm.
pub
/// Thin entry shim — `cos_strict` for `D57<SCALE>` with
/// `SCALE ∈ 18..=22`. See [`sin_cos_strict`] for the algorithm.
pub
/// `tan_strict` for `D57<SCALE>` with `SCALE ∈ 18..=22` — shares one
/// `sin_cos_fixed` between numerator and denominator, narrowed to
/// `GUARD_NARROW`. Panics if `cos(self) == 0` (odd multiples of π/2).
pub