Expand description
AVX-512 / AVX-512-IFMA accelerated CPU backends for the Poulpy lattice cryptography library.
This crate provides three backend implementations for poulpy_hal:
FFT64Avx512: f64 FFT backend, gated onenable-avx512f.NTT120Avx512: Q120 NTT backend over four ~30-bit CRT primes, gated onenable-avx512f.NTT126Ifma: Q126 NTT backend over three ~42-bit CRT primes, gated onenable-ifma.
§Architecture
poulpy_hal defines a hardware abstraction layer (HAL) via the
Backend trait and open extension point
(OEP) traits in poulpy_hal::oep. This crate implements those extension
points with AVX-512F, AVX-512-IFMA, AVX2/FMA, and scalar/reference fallback
paths depending on the backend and operation family.
The internal modules are organized by operation domain:
| Module | Domain |
|---|---|
fft64 | FFT64Avx512 backend and REIM FFT table wrappers |
znx_avx512 | AVX-512F single ring element arithmetic |
ntt120_avx512 | NTT120Avx512 NTT, VMP, convolution, and DFT kernels |
ntt126_ifma | NTT126Ifma IFMA NTT, VMP, SVP, convolution, and DFT code |
hal_impl | HAL OEP implementations and default wiring |
vec_znx_big_avx512 | AVX-512F i128 accumulator helpers |
§Scalar types
FFT64Avx512:ScalarPrep = f64,ScalarBig = i64.NTT120Avx512:ScalarPrep = Q120bScalar,ScalarBig = i128.NTT126Ifma:ScalarPrep = Q120bScalar,ScalarBig = i128.
§CPU requirements
FFT64Avx512 and NTT120Avx512 require x86-64 with AVX-512F. The FFT64
backend also uses AVX2 and FMA kernels and checks those features at module
construction.
NTT126Ifma additionally requires AVX-512-IFMA, AVX-512VL, BMI2, and ADX.
Runtime CPU feature detection is performed in
Module::new(); missing runtime features
cause a descriptive panic.
§Compile-time requirements
Backends are opt-in through Cargo features and matching target features:
RUSTFLAGS="-C target-feature=+avx512f" \
cargo build --release --features enable-avx512f
RUSTFLAGS="-C target-feature=+avx512f,+avx512ifma,+avx512vl,+bmi2,+adx" \
cargo build --release --features enable-ifmaIf neither feature is enabled, this crate compiles as an empty shell so the workspace remains portable on machines without AVX-512. Code that imports AVX-512 backend types must enable the feature that exports them.
§Correctness guarantees
Operations are deterministic across runs. FFT operations are constrained to preserve the rounding behavior expected by the reference backend, while NTT operations are exact modulo their CRT prime sets.
Integer overflow in limb arithmetic is intentional where the bivariate representation relies on wrapping arithmetic to propagate carries correctly across base-2^k limbs.
§Safety invariants
Unsafe kernels require:
- the selected backend’s CPU features to be enabled and present at runtime,
- input and output layouts to have matching shapes and documented bounds,
- buffers to satisfy the alignment required by
poulpy_hal::DEFAULTALIGN.
Violating those invariants may cause undefined behavior, panics, or silent arithmetic errors.
§Threading and concurrency
Backend marker types are zero-sized and Send + Sync. Module<BE> values
hold immutable precomputed tables after construction. Operations take
mutable output references, so normal Rust borrowing rules prevent data races
at the API boundary.
§Feature flags
enable-avx512f: exportsFFT64Avx512andNTT120Avx512.enable-ifma: impliesenable-avx512fand also exportsNTT126Ifma.enable-ckks: wires these backends intopoulpy-ckksdefaults.
§Platform support
- Required: x86-64.
FFT64Avx512: AVX-512F + AVX2 + FMA.NTT120Avx512: AVX-512F.NTT126Ifma: AVX-512F + AVX-512-IFMA + AVX-512VL + BMI2 + ADX.- Non-x86 targets and x86-64 CPUs without the selected feature set are not supported.
§Usage
The public backend marker types are used as type parameters to HAL, core, CKKS, and bin-FHE generic APIs. Application code usually selects one of these types in the backend-owning crate or benchmark harness.
§Versioning and stability
The public API consists of the backend marker types, FFT table wrappers, and
the ntt126_ifma_api support exports used by benchmarks. Other items are
implementation details.
Modules§
- ntt126_
ifma_ api - Public surface for tools that drive
NTT126Ifmakernels directly (e.g. the benches): the precomputed twiddle tables, the prime set, and theNtt126IfmaDFTExecutetrait used to dispatch a forward / inverse NTT.
Structs§
- FFT64
Avx512 - AVX-512F-accelerated CPU backend for Poulpy HAL.
- FFT64
Avx512 Reim Table - Precomputed twiddle-factor tables for the negacyclic reim FFT and IFFT, dispatching to AVX-512F-accelerated kernels.
- NTT120
Avx512 - AVX-512F-accelerated NTT120 CPU backend for Poulpy HAL.
- NTT126
Ifma - AVX512-IFMA accelerated NTT CPU backend for Poulpy HAL.
- ReimFFT
Avx512 - ReimIFFT
Avx512