1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
//! AVX2+FMA kernel wrappers (`x86_64`, `feature = "simd"`).
//!
//! Phase 20 establishes the dispatch *framework*. To guarantee
//! byte-for-byte scalar parity on every target-feature matrix row, the
//! AVX2 entry points currently delegate to the scalar reference
//! kernels. Each wrapper carries `#[target_feature(enable =
//! "avx2,fma")]` so that, once the scalar bodies are replaced with
//! genuine intrinsic code, the compiler will emit AVX2+FMA instructions
//! without any call-site changes.
//!
//! This file is deliberately restricted to `unsafe` entry points —
//! `dispatch::current()` verifies the CPU supports AVX2+FMA at runtime
//! before calling any of them. The wrappers themselves perform no
//! unsafe operations internally beyond the delegation call, so no
//! `// SAFETY:` comments are needed inside the bodies.
use scalar;
use crateCodecError;
/// Quantize `values` into `indices` via the AVX2+FMA kernel path.
///
/// # Safety
///
/// Caller must ensure the target CPU supports both AVX2 and FMA
/// instructions. In practice this means routing through
/// [`crate::codec::dispatch::current()`] — that helper runs
/// `is_x86_feature_detected!` before selecting this path.
///
/// # Errors
///
/// Propagates errors from [`super::scalar::quantize_into`].
pub unsafe
/// Dequantize `indices` into `values` via the AVX2+FMA kernel path.
///
/// # Safety
///
/// See [`quantize_into`] for the AVX2+FMA availability contract.
///
/// # Errors
///
/// Propagates errors from [`super::scalar::dequantize_into`].
pub unsafe
/// Cosine similarity via the AVX2+FMA kernel path.
///
/// # Safety
///
/// See [`quantize_into`] for the AVX2+FMA availability contract.
pub unsafe
/// Compute a fp16 residual buffer via the AVX2+FMA kernel path.
///
/// # Safety
///
/// See [`quantize_into`] for the AVX2+FMA availability contract.
pub unsafe
/// Decode and apply an fp16 residual via the AVX2+FMA kernel path.
///
/// # Safety
///
/// See [`quantize_into`] for the AVX2+FMA availability contract.
///
/// # Errors
///
/// Propagates errors from [`super::scalar::apply_residual_into`].
pub unsafe