Skip to main content

poulpy_hal/
lib.rs

1//! # poulpy-hal
2//!
3//! A trait-based Hardware Abstraction Layer (HAL) for lattice-based polynomial
4//! arithmetic over the cyclotomic ring `Z[X]/(X^N + 1)`.
5//!
6//! This crate provides backend-agnostic data layouts and a trait-based API for
7//! polynomial operations commonly used in lattice-based cryptography (LWE/Module-LWE
8//! ciphertexts, key-switching matrices, external products, etc.). It is designed
9//! so that cryptographic schemes can be written once against the [`api`] traits and
10//! then executed on any backend (CPU with AVX2/AVX-512, GPU, FPGA, ...) that
11//! implements the [`oep`] (Open Extension Point) traits.
12//!
13//! ## Core Concepts
14//!
15//! **Ring:** All polynomials live in `Z[X]/(X^N + 1)` where `N` is a power of
16//! two (the *ring degree*). A [`layouts::Module`] encapsulates `N` together with
17//! an optional backend-specific handle (e.g. precomputed FFT twiddle factors).
18//!
19//! **Limbed representation (base-2^k):** Large coefficients are decomposed into
20//! a vector of `size` limbs, each carrying at most `base2k` bits. This is the
21//! *bivariate* view `Z[X, Y]` with `Y = 2^{-k}`, central to gadget
22//! decomposition and normalization.
23//!
24//! **Layout types** ([`layouts`]):
25//! - [`layouts::ScalarZnx`] -- single polynomial with `i64` coefficients.
26//! - [`layouts::VecZnx`] -- vector of `cols` polynomials, each with `size` limbs.
27//! - [`layouts::MatZnx`] -- matrix of polynomials (`rows x cols_in`, each entry a [`layouts::VecZnx`] of `cols_out` polynomials).
28//! - [`layouts::VecZnxBig`] -- vector of polynomials with backend-specific large-coefficient scalars (result accumulator).
29//! - [`layouts::VecZnxDft`] -- vector of polynomials in DFT/NTT domain (backend-specific prepared scalars).
30//! - [`layouts::SvpPPol`] -- prepared scalar polynomial for scalar-vector products.
31//! - [`layouts::VmpPMat`] -- prepared matrix for vector-matrix products.
32//! - [`layouts::CnvPVecL`], [`layouts::CnvPVecR`] -- prepared left/right operands for bivariate convolution.
33//! - [`layouts::ScratchArena`], [`layouts::ScratchOwned`] -- aligned scratch memory for temporary workspace.
34//!
35//! All layout types are generic over a data container `D` (owned `Vec<u8>`, borrowed
36//! `&[u8]` / `&mut [u8]`), enabling zero-copy views and arena-style allocation via
37//! [`layouts::ScratchArena`].
38//!
39//! ## Architecture
40//!
41//! The crate is organized into a four-layer stack:
42//!
43//! 1. **[`api`]** -- Safe, user-facing trait definitions (e.g. [`api::VecZnxAddIntoBackend`],
44//!    [`api::VmpApplyDftToDft`]). Scheme authors program against these.
45//! 2. **[`oep`]** -- Unsafe extension-point layer of per-family backend traits.
46//!    Backend crates implement only the families they own and may reuse helper
47//!    macros or defaults where convenient.
48//! 3. **[`delegates`]** -- Blanket `impl` glue that connects each [`api`] trait to
49//!    the corresponding backend family method on [`layouts::Module`].
50//! 4. **Reference implementations** live in the `poulpy-cpu-ref` crate, which provides
51//!    the portable default backend used by tests and benchmarks.
52//!
53//! ## Testing and Benchmarking
54//!
55//! The [`test_suite`] module provides fully generic, backend-parametric test
56//! functions. Backend crates instantiate these via the
57//! [`backend_test_suite!`](crate::backend_test_suite) and
58//! [`cross_backend_test_suite!`](crate::cross_backend_test_suite) macros to
59//! validate correctness against the reference implementation in
60//! [`poulpy-cpu-ref`](https://docs.rs/poulpy-cpu-ref).
61//!
62//! Analogous Criterion-based benchmark harnesses live in the separate
63//! [`poulpy-bench`](https://docs.rs/poulpy-bench) crate.
64//!
65//! ## Safety Contract
66//!
67//! All [`oep`] extension points are `unsafe` to implement. Implementors must uphold the
68//! contract documented in [`doc::backend_safety`], covering memory domains,
69//! alignment, scratch lifetime, synchronization, aliasing, and numerical
70//! exactness.
71//!
72//! ## Non-Goals
73//!
74//! - This crate does **not** provide a complete cryptographic scheme. It is a
75//!   low-level arithmetic layer consumed by higher-level crates such as
76//!   `poulpy-core` and `poulpy-bin-fhe`.
77//! - It does **not** perform constant-time enforcement. Side-channel resistance
78//!   is the responsibility of the backend and the caller.
79//!
80//! ## Compatibility
81//!
82//! - Requires **nightly** Rust (uses `#![feature(trait_alias)]`).
83//! - All memory allocations are aligned to [`DEFAULTALIGN`] (64 bytes).
84//! - Types matching the API of **spqlios-arithmetic**.
85
86#![allow(non_camel_case_types, non_snake_case, non_upper_case_globals, dead_code, improper_ctypes)]
87#![deny(rustdoc::broken_intra_doc_links)]
88#![cfg_attr(docsrs, feature(doc_cfg))]
89#![feature(trait_alias)]
90
91/// Safe, user-facing trait definitions for polynomial arithmetic operations.
92///
93/// Scheme authors program against these traits; the actual computation is
94/// dispatched to a backend via the [`oep`] extension points.
95pub mod api;
96
97/// Criterion-based benchmark harnesses, generic over any backend.
98/// Blanket implementations connecting [`api`] traits to [`oep`] traits on
99/// [`layouts::Module`].
100///
101/// This module contains no user-facing logic; it exists solely to wire
102/// the safe API layer to the unsafe backend implementations.
103pub mod delegates;
104
105/// Backend-agnostic data layout types for polynomials, vectors, matrices,
106/// and prepared (DFT-domain) representations.
107///
108/// All types are generic over a data container `D` (`Vec<u8>`, `&[u8]`,
109/// `&mut [u8]`) enabling owned, borrowed, and scratch-backed usage.
110pub mod layouts;
111
112/// Open Extension Points: the `unsafe` backend extension layer of per-family
113/// backend traits.
114///
115/// Backend crates implement only the families they own and may delegate to
116/// helper defaults provided by a backend crate (for example `poulpy-cpu-ref`). See
117/// [`doc::backend_safety`] for the safety contract.
118pub mod oep;
119
120/// Deterministic pseudorandom number generation based on ChaCha8.
121pub mod source;
122
123/// Fully generic, backend-parametric test functions.
124///
125/// Backend crates instantiate these via the [`backend_test_suite!`] and
126/// [`cross_backend_test_suite!`] macros.
127pub mod test_suite;
128
129/// Embedded safety contract documentation for backend implementors.
130pub mod doc {
131    /// Safety contract that all [`crate::oep`] trait implementations must uphold.
132    ///
133    /// Covers memory domains, alignment, scratch lifetime, synchronization,
134    /// aliasing, and numerical exactness requirements.
135    #[doc = include_str!(concat!(env!("CARGO_MANIFEST_DIR"), "/docs/backend_safety_contract.md"))]
136    pub mod backend_safety {
137        pub const _PLACEHOLDER: () = ();
138    }
139}
140
141/// Default generator of the Galois group `(Z/2NZ)*` for the cyclotomic ring
142/// `Z[X]/(X^N + 1)`.
143///
144/// Used to compute Galois automorphisms `X -> X^{5^k}` and their inverses.
145pub const GALOISGENERATOR: u64 = 5;
146
147/// Default memory alignment in bytes for all allocated buffers.
148///
149/// Set to 64 bytes to match the cache-line size of modern x86 processors
150/// and the alignment required by AVX-512 instructions.
151pub const DEFAULTALIGN: usize = 64;
152
153fn is_aligned_custom<T>(ptr: *const T, align: usize) -> bool {
154    (ptr as usize).is_multiple_of(align)
155}
156
157/// Returns `true` if `ptr` is aligned to [`DEFAULTALIGN`] bytes.
158pub fn is_aligned<T>(ptr: *const T) -> bool {
159    is_aligned_custom(ptr, DEFAULTALIGN)
160}
161
162/// Panics if `ptr` is not aligned to [`DEFAULTALIGN`] bytes.
163///
164/// # Panics
165///
166/// Panics with a descriptive message when the pointer does not satisfy the
167/// default alignment requirement.
168pub fn assert_alignment<T>(ptr: *const T) {
169    assert!(
170        is_aligned(ptr),
171        "invalid alignment: ensure passed bytes have been allocated with [alloc_aligned_u8] or [alloc_aligned]"
172    )
173}
174
175/// Deprecated spelling variant. Use [`assert_alignment`] instead.
176#[inline]
177pub fn assert_alignement<T>(ptr: *const T) {
178    assert_alignment(ptr)
179}
180
181/// Reinterprets a `&[T]` as a `&[V]`.
182///
183/// # Safety (via assertions)
184/// - `V` must not be zero-sized.
185/// - The pointer must be aligned for `V`.
186/// - The total byte length must be a multiple of `size_of::<V>()`.
187pub fn cast<T, V>(data: &[T]) -> &[V] {
188    assert!(size_of::<V>() > 0, "cast: target type V must not be zero-sized");
189    let byte_len: usize = std::mem::size_of_val(data);
190    assert!(
191        byte_len.is_multiple_of(size_of::<V>()),
192        "cast: byte length {} is not a multiple of target size {}",
193        byte_len,
194        size_of::<V>()
195    );
196    let ptr: *const V = data.as_ptr() as *const V;
197    assert!(
198        ptr.align_offset(align_of::<V>()) == 0,
199        "cast: pointer {:p} is not aligned to {} bytes",
200        ptr,
201        align_of::<V>()
202    );
203    let len: usize = byte_len / size_of::<V>();
204    unsafe { std::slice::from_raw_parts(ptr, len) }
205}
206
207/// Reinterprets a `&mut [T]` as a `&mut [V]`.
208///
209/// # Safety (via assertions)
210/// - `V` must not be zero-sized.
211/// - The pointer must be aligned for `V`.
212/// - The total byte length must be a multiple of `size_of::<V>()`.
213pub fn cast_mut<T, V>(data: &mut [T]) -> &mut [V] {
214    assert!(size_of::<V>() > 0, "cast_mut: target type V must not be zero-sized");
215    let byte_len: usize = std::mem::size_of_val(data);
216    assert!(
217        byte_len.is_multiple_of(size_of::<V>()),
218        "cast_mut: byte length {} is not a multiple of target size {}",
219        byte_len,
220        size_of::<V>()
221    );
222    let ptr: *mut V = data.as_mut_ptr() as *mut V;
223    assert!(
224        ptr.align_offset(align_of::<V>()) == 0,
225        "cast_mut: pointer {:p} is not aligned to {} bytes",
226        ptr,
227        align_of::<V>()
228    );
229    let len: usize = byte_len / size_of::<V>();
230    unsafe { std::slice::from_raw_parts_mut(ptr, len) }
231}
232
233/// Minimum allocation size for which the aligned allocator advises
234/// transparent huge pages. Overridable via `POULPY_HUGEPAGE_MIN_BYTES`.
235const HUGEPAGE_ADVISE_THRESHOLD: usize = 2 * 1024 * 1024;
236
237#[cfg(target_os = "linux")]
238fn hugepage_min_bytes() -> usize {
239    use once_cell::sync::Lazy;
240    static THRESH: Lazy<usize> = Lazy::new(|| {
241        std::env::var("POULPY_HUGEPAGE_MIN_BYTES")
242            .ok()
243            .and_then(|v| v.trim().parse().ok())
244            .unwrap_or(HUGEPAGE_ADVISE_THRESHOLD)
245    });
246    *THRESH
247}
248
249/// `madvise(MADV_HUGEPAGE)` on a freshly-allocated range. Skipped if the
250/// pointer is not page-aligned (non-mmap'd heap arenas) or the size is
251/// below the threshold. Failure is silently ignored — advisory only.
252#[cfg(target_os = "linux")]
253fn advise_hugepage(ptr: *mut u8, size: usize) {
254    if size < hugepage_min_bytes() {
255        return;
256    }
257    let page_size = unsafe { libc::sysconf(libc::_SC_PAGESIZE) };
258    if page_size <= 0 {
259        return;
260    }
261    let page_size = page_size as usize;
262    if !(ptr as usize).is_multiple_of(page_size) {
263        return;
264    }
265    let len = size - (size % page_size);
266    if len == 0 {
267        return;
268    }
269    let _ = unsafe { libc::madvise(ptr.cast(), len, libc::MADV_HUGEPAGE) };
270}
271
272#[cfg(not(target_os = "linux"))]
273#[inline(always)]
274fn advise_hugepage(_ptr: *mut u8, _size: usize) {}
275
276/// Allocates a block of bytes with a custom alignment.
277/// Alignment must be a power of two and size a multiple of the alignment.
278/// Allocated memory is initialized to zero.
279///
280/// Large allocations are advised for transparent huge pages via
281/// [`advise_hugepage`] on Linux before the zero-fill.
282///
283/// # Known issue (CRITICAL-2)
284/// The returned `Vec<u8>` was allocated with custom alignment via `std::alloc::alloc`,
285/// but `Vec::drop` will call `std::alloc::dealloc` with `align_of::<u8>() = 1`.
286/// This is technically UB per the `GlobalAlloc` contract (mismatched layout).
287/// In practice it works on all major allocators (glibc, jemalloc, mimalloc) because
288/// they ignore the alignment parameter during deallocation. A proper fix requires
289/// replacing `Vec<u8>` with a custom `AlignedBuf` type that tracks the layout.
290fn alloc_aligned_custom_u8(size: usize, align: usize) -> Vec<u8> {
291    assert!(align.is_power_of_two(), "Alignment must be a power of two but is {align}");
292    assert_eq!(
293        (size * size_of::<u8>()) % align,
294        0,
295        "size={size} must be a multiple of align={align}"
296    );
297    unsafe {
298        let layout: std::alloc::Layout = std::alloc::Layout::from_size_align(size, align).expect("Invalid alignment");
299        let ptr: *mut u8 = std::alloc::alloc(layout);
300        if ptr.is_null() {
301            panic!("Memory allocation failed");
302        }
303        assert!(
304            is_aligned_custom(ptr, align),
305            "Memory allocation at {ptr:p} is not aligned to {align} bytes"
306        );
307        // Advise before write_bytes so the zero-fill faults materialise
308        // 2 MB pages directly rather than relying on khugepaged promotion.
309        advise_hugepage(ptr, size);
310        std::ptr::write_bytes(ptr, 0, size);
311        Vec::from_raw_parts(ptr, size, size)
312    }
313}
314
315/// Allocates a zero-initialized `Vec<T>` with custom alignment.
316///
317/// The total byte size (`size * size_of::<T>()`) must be a multiple of `align`,
318/// and `align` must be a power of two.
319///
320/// # Panics
321///
322/// - If `T` is zero-sized.
323/// - If `align` is not a power of two.
324/// - If `size * size_of::<T>()` is not a multiple of `align`.
325pub fn alloc_aligned_custom<T>(size: usize, align: usize) -> Vec<T> {
326    assert!(size_of::<T>() > 0, "alloc_aligned_custom: zero-sized types are not supported");
327    assert!(align.is_power_of_two(), "Alignment must be a power of two but is {align}");
328
329    assert_eq!(
330        (size * size_of::<T>()) % align,
331        0,
332        "size*size_of::<T>()={} must be a multiple of align={align}",
333        size * size_of::<T>(),
334    );
335
336    let mut vec_u8: Vec<u8> = alloc_aligned_custom_u8(size_of::<T>() * size, align);
337    let ptr: *mut T = vec_u8.as_mut_ptr() as *mut T;
338    let len: usize = vec_u8.len() / size_of::<T>();
339    let cap: usize = vec_u8.capacity() / size_of::<T>();
340    std::mem::forget(vec_u8);
341    unsafe { Vec::from_raw_parts(ptr, len, cap) }
342}
343
344/// Allocates a zero-initialized `Vec<T>` aligned to [`DEFAULTALIGN`] bytes.
345///
346/// The allocation is padded so that the total byte size is a multiple of
347/// [`DEFAULTALIGN`]. This is the primary allocation entry point for all
348/// layout types in the crate.
349///
350/// # Panics
351///
352/// Panics if `T` is zero-sized.
353pub fn alloc_aligned<T>(size: usize) -> Vec<T> {
354    alloc_aligned_custom::<T>(
355        (size * size_of::<T>()).next_multiple_of(DEFAULTALIGN) / size_of::<T>(),
356        DEFAULTALIGN,
357    )
358}