1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
//! Platform-specific SIMD quantization kernels.
//!
//! This module provides a cached runtime capability detection entry point
//! and re-exports each platform's submodule behind its appropriate feature
//! and target gates.
//!
//! # Runtime detection
//!
//! [`cached_capabilities`] returns a `&'static SimdCapabilities` that is
//! initialised exactly once (via [`std::sync::OnceLock`]) on first access.
//! Subsequent calls return the cached value with zero overhead.
//!
//! # Sub-modules
//!
//! | Module | Feature flag | Target |
//! |--------|-------------|--------|
//! | `avx2` | `simd-avx2` | `x86_64` |
use OnceLock;
use crateSimdCapabilities;
/// Cached SIMD capability detection result.
///
/// Populated on the first call to [`cached_capabilities`] and reused for
/// all subsequent calls.
static CACHED_CAPS: = new;
/// Return the detected SIMD capabilities for this CPU, lazily computed once.
///
/// Uses [`std::sync::OnceLock`] so the underlying detection (`CPUID` on
/// x86_64, compile-time flag on aarch64) runs at most once per process.
/// AVX2+FMA kernels (x86_64 only, `simd-avx2` feature).
/// AVX-512 kernels (x86_64 only, `simd-avx512` feature).
/// AArch64 NEON kernels (`simd-neon` feature).
/// oxiblas-backed GEMM kernels for F32, F16, and BF16.
///
/// Always available — no CPU feature gate required since oxiblas is a
/// workspace dependency for all configurations.