1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The Infino Authors
//! Runtime SIMD dispatch gates for the vector + bloom kernels.
//!
//! Sibling to `distance.rs` because multiple kernels across the
//! codebase want to query the same per-feature gates — `distance::dot`
//! / `distance::l2_sq` (AVX-512F or AVX2), `quant::estimate_dot_rotated`
//! (AVX-512 VPOPCNTDQ), `supertable::manifest::bloom::contains`
//! (AVX-512F + DQ for `vpternlogq` / `kortestz`), and the Sq8
//! cross-product kernel (AVX-512 VPMOVZXBD or AVX2 VPMOVZXBD).
//!
//! Each gate is a `OnceLock<bool>` cached on first call. The cost
//! per call after the first is one relaxed atomic load (~1 ns)
//! and an inlined `&*` deref — negligible next to the kernel work
//! it gates. Initialization reads `INFINO_DISABLE_AVX512` (or
//! `INFINO_DISABLE_AVX2`) first (the env overrides for A/B perf /
//! regression isolation), then runs the appropriate
//! `is_x86_feature_detected!` chain.
//!
//! Flipping the env var after the first call has **no effect** —
//! gates are sticky once cached.
use ;
/// True iff this binary should use AVX-512 fast-path kernels.
/// Checks the CPUID baseline that *every* AVX-512 kernel in the
/// codebase relies on: F (foundation), BW (byte/word), DQ
/// (doubleword/quadword), VL (vector length).
///
/// Per-instruction extensions (VPOPCNTDQ) live in their own
/// gates ([`has_vpopcntdq`]) because a kernel that uses only
/// those needs them in addition to F — and there's a small
/// but real population of AVX-512F-only hosts (Knights Landing —
/// not in our fleet but cheap to be correct about) that lack the
/// extensions.
///
/// Set `INFINO_DISABLE_AVX512=1` to force the AVX2 / wide path on
/// hosts that *do* support AVX-512 — for A/B perf comparison or
/// regression isolation without rebuilding. Reads the env var
/// exactly once on the first call.
// Every dispatch site that calls this is itself x86-gated, so on other
// targets the function is unused in the library build (it stays defined
// because the `cfg(test)` gate tests reference it on all targets).
pub
/// True iff the host supports AVX-512 VPOPCNTDQ (per-element 64-bit
/// popcount). Required by `quant::estimate_dot_rotated`'s AVX-512
/// rewrite — its "masked add of query lanes keyed by code bits"
/// path uses `_mm512_mask_add_ps` whose mask comes from a code-byte
/// load, but the throughput-equivalent `popcount` formulation in
/// some shapes also benefits.
///
/// Also implies [`avx512_enabled`] (we never enable a specialized
/// kernel on a host without the foundation), so callers should
/// check this gate alone.
pub
/// True iff this binary should use AVX2 fast-path kernels in the
/// "wide" tier. Checks `is_x86_feature_detected!("avx2")` at
/// runtime; near-universally true on production x86_64 hosts (Intel
/// Haswell+ / AMD Excavator+) but not assumed by the build target.
///
/// Sits between [`avx512_enabled`] (the fastest tier — 512-bit) and
/// the portable scalar-widen fallback. Hosts that have AVX-512
/// always also have AVX2, but [`avx512_enabled`] gets checked first
/// at every dispatch site, so the AVX2 gate is only consulted when
/// AVX-512 is off (either no AVX-512 silicon, or
/// `INFINO_DISABLE_AVX512=1`).
///
/// Set `INFINO_DISABLE_AVX2=1` to force the portable scalar-widen
/// path on hosts that *do* support AVX2 — for A/B perf comparison
/// or pinning the universal fallback path under test without
/// rebuilding. Reads the env var exactly once on the first call.
/// Parses `INFINO_DISABLE_AVX512` from the environment. Accepts `1`
/// or `true` (case-insensitive); everything else (including unset)
/// is false. Pulled into its own helper so the parsing logic is
/// shared across the gates above and exercised by unit tests.
// Reached only through the x86-gated `avx512_enabled`, so it is unused
// in a non-x86 library build (its sibling `disable_avx2_env_set` is
// reached on all targets via the `pub` `avx2_enabled`, hence no allowance).
/// Parses `INFINO_DISABLE_AVX2` from the environment. Same accepted
/// values as [`disable_env_set`]; see that function for the contract.