Expand description
Centralized ISA runtime dispatch codegen for OxiFFT SIMD codelets.
This module generates cached runtime ISA dispatchers that extend the
inline dispatchers in super with an AtomicU8-based ISA level cache.
§Motivation
The basic dispatchers emitted by super::gen_dispatcher perform
is_x86_feature_detected! / is_aarch64_feature_detected! on every call.
While each call is cheap (typically one CPUID cache read), a hot codelet
invoked millions of times per second may benefit from the cached path, which
replaces repeated feature probes with a single AtomicU8 load.
§Priority order (high → low)
x86_64: AVX-512F > AVX2+FMA > AVX > SSE2 > scalar
aarch64: NEON > scalar
other: scalar§Generated code shape
For each (size, precision) pair, the proc-macro emits:
- ISA level constants (
ISA_SCALAR,ISA_SSE2, …ISA_UNDETECTED) - A
static DETECTED_ISA_{size}_{TY}: AtomicU8initialized toISA_UNDETECTED - A private
detect_isa_{size}_{ty}() -> u8function that probes the CPU once - A public
{fn_name}_cached(data, sign)dispatcher that reads the cache first
§Proc-macro entry
ⓘ
// Generates a cached dispatcher for size-4 f32.
gen_dispatcher_codelet!(size = 4, ty = f32);Re-exports§
pub use super::multi_transform::Precision;
Structs§
- Dispatcher
Config - Configuration for a cached runtime ISA dispatcher codelet.
Constants§
- ISA_AVX
- ISA level for pure AVX (no FMA, no AVX2).
- ISA_
AVX2_ FMA - ISA level for AVX2 + FMA.
- ISA_
AVX512 - ISA level for AVX-512F.
- ISA_
NEON - ISA level for NEON (aarch64).
- ISA_
SCALAR - ISA level for scalar fallback.
- ISA_
SSE2 - ISA level for SSE2.
- ISA_
UNDETECTED - Sentinel: ISA not yet detected (stored in the
AtomicU8before first call).
Functions§
- detect_
host_ isa - Detect the best ISA available on the current host at runtime.
- generate_
dispatcher - Generate a cached runtime ISA dispatcher
TokenStream. - generate_
from_ macro - Entry point for the
gen_dispatcher_codelet!proc-macro.