poulpy-cpu-ref 0.6.0

Portable reference CPU implementations of poulpy-hal open extension points
Documentation

๐Ÿ™ Poulpy-CPU-REF

Poulpy-CPU-REF is the reference (portable) CPU backend for Poulpy.

It implements the Poulpy HAL extension traits without requiring SIMD or specialized CPU instructions, making it suitable for:

  • all CPU architectures (x86_64, aarch64, arm, riscv64, โ€ฆ)
  • development machines and CI runners
  • environments without AVX or other advanced SIMD support

This backend integrates transparently with:

  • poulpy-hal
  • poulpy-core
  • poulpy-ckks
  • poulpy-bin-fhe

When is this backend used?

The FFT64 and NTT120 reference HAL backends are always available and require no compilation flags and no CPU features.

It is automatically selected when:

  • the project does not request an optimized backend, or
  • the target CPU does not support the requested SIMD backend (e.g., AVX), or
  • portability and reproducibility are more important than raw performance.

No additional configuration is required to use it.

Higher-level backend wiring is feature-gated:

  • enable-core wires the reference backends into poulpy-core defaults.
  • enable-ckks wires the reference backends into poulpy-ckks defaults and also enables core support.

Useful test commands:

# HAL/reference backend tests
cargo test -p poulpy-cpu-ref

# Core conformance tests on FFT64Ref and NTT120Ref
cargo test -p poulpy-cpu-ref --features enable-core

# CKKS conformance tests on FFT64Ref and NTT120Ref
cargo test -p poulpy-cpu-ref --features enable-ckks

๐Ÿงช Basic Usage

This crate exposes two backends:

use poulpy_cpu_ref::{FFT64Ref, NTT120Ref};
use poulpy_hal::{api::ModuleNew, layouts::Module};

let log_n: usize = 10;

// f64 FFT backend
let module: Module<FFT64Ref> = Module::<FFT64Ref>::new(1 << log_n);

// Q120 NTT backend (CRT over four ~30-bit primes)
let module: Module<NTT120Ref> = Module::<NTT120Ref>::new(1 << log_n);

Both work on all supported platforms and architectures.


Performance Notes

poulpy-cpu-ref prioritizes:

  • portability
  • correctness
  • ease of debugging

For maximum performance on x86_64 CPUs with AVX2 + FMA support, consider enabling the optional optimized backend:

poulpy-cpu-avx (feature: enable-avx)

For x86_64 CPUs with AVX-512 support, consider the AVX-512 backend:

poulpy-cpu-avx512 (features: enable-avx512f, enable-ifma)

Benchmarks and applications can freely switch between backends without changing source code โ€” backend selection can be handled with feature flags, for example

#[cfg(all(feature = "enable-avx", target_arch = "x86_64", target_feature = "avx2", target_feature = "fma"))]
use poulpy_cpu_avx::FFT64Avx as BackendImpl;

#[cfg(not(all(feature = "enable-avx", target_arch = "x86_64", target_feature = "avx2", target_feature = "fma")))]
use poulpy_cpu_ref::FFT64Ref as BackendImpl;

The same pattern applies to NTT120 backends (NTT120Ref / NTT120Avx).


๐Ÿค Contributors

To implement your own backend (SIMD or accelerator):

  1. Define a backend struct and implement the Backend trait from poulpy-hal.
  2. For each HAL operation family, either call the blanket default or implement the OEP trait directly with a custom dispatch.
  3. For each poulpy-core operation family, either call the corresponding impl_*_defaults_full! macro to inherit the portable implementation, or implement the OEP trait directly to override it.
  4. Optionally, do the same for poulpy-ckks behind a backend-owned enable-ckks feature using the impl_ckks_*_defaults! macros or direct OEP trait implementations.

At every layer the macro and the direct implementation are mutually exclusive per operation family: the macro opts the backend into the portable default path, while a direct OEP impl replaces it entirely. There is no requirement to use the macros โ€” a backend that needs full control can implement every OEP trait by hand.

Your backend will automatically integrate with the backend-generic layers:

  • poulpy-hal
  • poulpy-core
  • poulpy-ckks

No modifications to those crates are necessary โ€” the HAL provides the extension points. Scheme crates that still carry crate-specific backend glue, such as parts of poulpy-bin-fhe in v0.6.0, may need follow-up integration work. Only the operations that need a faster implementation require explicit overrides; everything else is inherited from the default layer for free.


For questions or guidance, feel free to open an issue or discussion in the repository.