Skip to main content

Crate poulpy_cpu_avx512

Crate poulpy_cpu_avx512 

Source
Expand description

AVX-512 / AVX-512-IFMA accelerated CPU backends for the Poulpy lattice cryptography library.

This crate provides three backend implementations for poulpy_hal:

  • FFT64Avx512: f64 FFT backend, gated on enable-avx512f.
  • NTT120Avx512: Q120 NTT backend over four ~30-bit CRT primes, gated on enable-avx512f.
  • NTT126Ifma: Q126 NTT backend over three ~42-bit CRT primes, gated on enable-ifma.

§Architecture

poulpy_hal defines a hardware abstraction layer (HAL) via the Backend trait and open extension point (OEP) traits in poulpy_hal::oep. This crate implements those extension points with AVX-512F, AVX-512-IFMA, AVX2/FMA, and scalar/reference fallback paths depending on the backend and operation family.

The internal modules are organized by operation domain:

ModuleDomain
fft64FFT64Avx512 backend and REIM FFT table wrappers
znx_avx512AVX-512F single ring element arithmetic
ntt120_avx512NTT120Avx512 NTT, VMP, convolution, and DFT kernels
ntt126_ifmaNTT126Ifma IFMA NTT, VMP, SVP, convolution, and DFT code
hal_implHAL OEP implementations and default wiring
vec_znx_big_avx512AVX-512F i128 accumulator helpers

§Scalar types

  • FFT64Avx512: ScalarPrep = f64, ScalarBig = i64.
  • NTT120Avx512: ScalarPrep = Q120bScalar, ScalarBig = i128.
  • NTT126Ifma: ScalarPrep = Q120bScalar, ScalarBig = i128.

§CPU requirements

FFT64Avx512 and NTT120Avx512 require x86-64 with AVX-512F. The FFT64 backend also uses AVX2 and FMA kernels and checks those features at module construction.

NTT126Ifma additionally requires AVX-512-IFMA, AVX-512VL, BMI2, and ADX. Runtime CPU feature detection is performed in Module::new(); missing runtime features cause a descriptive panic.

§Compile-time requirements

Backends are opt-in through Cargo features and matching target features:

RUSTFLAGS="-C target-feature=+avx512f" \
    cargo build --release --features enable-avx512f

RUSTFLAGS="-C target-feature=+avx512f,+avx512ifma,+avx512vl,+bmi2,+adx" \
    cargo build --release --features enable-ifma

If neither feature is enabled, this crate compiles as an empty shell so the workspace remains portable on machines without AVX-512. Code that imports AVX-512 backend types must enable the feature that exports them.

§Correctness guarantees

Operations are deterministic across runs. FFT operations are constrained to preserve the rounding behavior expected by the reference backend, while NTT operations are exact modulo their CRT prime sets.

Integer overflow in limb arithmetic is intentional where the bivariate representation relies on wrapping arithmetic to propagate carries correctly across base-2^k limbs.

§Safety invariants

Unsafe kernels require:

  • the selected backend’s CPU features to be enabled and present at runtime,
  • input and output layouts to have matching shapes and documented bounds,
  • buffers to satisfy the alignment required by poulpy_hal::DEFAULTALIGN.

Violating those invariants may cause undefined behavior, panics, or silent arithmetic errors.

§Threading and concurrency

Backend marker types are zero-sized and Send + Sync. Module<BE> values hold immutable precomputed tables after construction. Operations take mutable output references, so normal Rust borrowing rules prevent data races at the API boundary.

§Feature flags

  • enable-avx512f: exports FFT64Avx512 and NTT120Avx512.
  • enable-ifma: implies enable-avx512f and also exports NTT126Ifma.
  • enable-ckks: wires these backends into poulpy-ckks defaults.

§Platform support

  • Required: x86-64.
  • FFT64Avx512: AVX-512F + AVX2 + FMA.
  • NTT120Avx512: AVX-512F.
  • NTT126Ifma: AVX-512F + AVX-512-IFMA + AVX-512VL + BMI2 + ADX.
  • Non-x86 targets and x86-64 CPUs without the selected feature set are not supported.

§Usage

The public backend marker types are used as type parameters to HAL, core, CKKS, and bin-FHE generic APIs. Application code usually selects one of these types in the backend-owning crate or benchmark harness.

§Versioning and stability

The public API consists of the backend marker types, FFT table wrappers, and the ntt126_ifma_api support exports used by benchmarks. Other items are implementation details.

Modules§

ntt126_ifma_api
Public surface for tools that drive NTT126Ifma kernels directly (e.g. the benches): the precomputed twiddle tables, the prime set, and the Ntt126IfmaDFTExecute trait used to dispatch a forward / inverse NTT.

Structs§

FFT64Avx512
AVX-512F-accelerated CPU backend for Poulpy HAL.
FFT64Avx512ReimTable
Precomputed twiddle-factor tables for the negacyclic reim FFT and IFFT, dispatching to AVX-512F-accelerated kernels.
NTT120Avx512
AVX-512F-accelerated NTT120 CPU backend for Poulpy HAL.
NTT126Ifma
AVX512-IFMA accelerated NTT CPU backend for Poulpy HAL.
ReimFFTAvx512
ReimIFFTAvx512