1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
/// AVX2/FMA-accelerated CPU backend for Poulpy HAL.
///
/// `FFT64Avx` is a zero-sized marker type that selects the AVX2-optimized CPU backend
/// when used as the type parameter `B` in [`Module<B>`](poulpy_hal::layouts::Module)
/// and related HAL types. It implements all open extension point (OEP) traits from
/// `poulpy_hal::oep` using hand-tuned AVX2/FMA SIMD intrinsics and assembly kernels.
///
/// # Backend characteristics
///
/// - **ScalarPrep**: `f64` — DFT-domain coefficients are 64-bit IEEE 754 floats.
/// - **ScalarBig**: `i64` — large-coefficient ring elements use 64-bit signed integers.
/// - **FFT tables**: precomputed twiddle factors stored in the module handle
/// ([`FFT64AvxHandle`]), shared across all operations on the same module.
///
/// # CPU feature requirements
///
/// **Runtime check**: [`Module::new()`](poulpy_hal::layouts::Module::new) verifies that
/// the CPU supports AVX2, AVX, and FMA. If any feature is missing, the constructor panics.
///
/// **Compile-time requirement**: Code must be compiled with `-C target-feature=+avx2,+fma`.
/// Failure to do so results in a compile error.
///
/// # Thread safety
///
/// `FFT64Avx` is `Send + Sync` (derived from being a zero-sized, field-less struct).
/// The `Module<FFT64Avx>` that holds the FFT tables is also `Send + Sync`, so modules can
/// be shared across threads. Individual operations require exclusive (`&mut`) access to their
/// output buffers and scratch space, preventing data races at the API level.
///
/// # Performance notes
///
/// - Optimized for x86-64 CPUs with AVX2 (2013+) and FMA (Haswell and later).
/// - Hand-written assembly FFT kernels outperform compiler-generated intrinsics by ~2×.
/// - Memory layout is vectorized (4 × `i64` per AVX2 register) with 64-byte alignment.
/// - Scalar fallback for non-multiple-of-4 lengths (negligible overhead for typical FHE parameters).
///
/// # Example usage pattern
///
/// This type is typically not used directly but via HAL generic code:
///
/// ```text
/// use poulpy_hal::layouts::Module;
/// use poulpy_cpu_avx::FFT64Avx;
///
/// let module: Module<FFT64Avx> = Module::new(1024);
/// // Use module for FHE operations...
/// ```
pub use ;