poulpy_hal/lib.rs
1//! # poulpy-hal
2//!
3//! A trait-based Hardware Abstraction Layer (HAL) for lattice-based polynomial
4//! arithmetic over the cyclotomic ring `Z[X]/(X^N + 1)`.
5//!
6//! This crate provides backend-agnostic data layouts and a trait-based API for
7//! polynomial operations commonly used in lattice-based cryptography (LWE/Module-LWE
8//! ciphertexts, key-switching matrices, external products, etc.). It is designed
9//! so that cryptographic schemes can be written once against the [`api`] traits and
10//! then executed on any backend (CPU with AVX2/AVX-512, GPU, FPGA, ...) that
11//! implements the [`oep`] (Open Extension Point) traits.
12//!
13//! ## Core Concepts
14//!
15//! **Ring:** All polynomials live in `Z[X]/(X^N + 1)` where `N` is a power of
16//! two (the *ring degree*). A [`layouts::Module`] encapsulates `N` together with
17//! an optional backend-specific handle (e.g. precomputed FFT twiddle factors).
18//!
19//! **Limbed representation (base-2^k):** Large coefficients are decomposed into
20//! a vector of `size` limbs, each carrying at most `base2k` bits. This is the
21//! *bivariate* view `Z[X, Y]` with `Y = 2^{-k}`, central to gadget
22//! decomposition and normalization.
23//!
24//! **Layout types** ([`layouts`]):
25//! - [`layouts::ScalarZnx`] -- single polynomial with `i64` coefficients.
26//! - [`layouts::VecZnx`] -- vector of `cols` polynomials, each with `size` limbs.
27//! - [`layouts::MatZnx`] -- matrix of polynomials (`rows x cols_in`, each entry a [`layouts::VecZnx`] of `cols_out` polynomials).
28//! - [`layouts::VecZnxBig`] -- vector of polynomials with backend-specific large-coefficient scalars (result accumulator).
29//! - [`layouts::VecZnxDft`] -- vector of polynomials in DFT/NTT domain (backend-specific prepared scalars).
30//! - [`layouts::SvpPPol`] -- prepared scalar polynomial for scalar-vector products.
31//! - [`layouts::VmpPMat`] -- prepared matrix for vector-matrix products.
32//! - [`layouts::CnvPVecL`], [`layouts::CnvPVecR`] -- prepared left/right operands for bivariate convolution.
33//! - [`layouts::ScratchArena`], [`layouts::ScratchOwned`] -- aligned scratch memory for temporary workspace.
34//!
35//! All layout types are generic over a data container `D` (owned `Vec<u8>`, borrowed
36//! `&[u8]` / `&mut [u8]`), enabling zero-copy views and arena-style allocation via
37//! [`layouts::ScratchArena`].
38//!
39//! ## Architecture
40//!
41//! The crate is organized into a four-layer stack:
42//!
43//! 1. **[`api`]** -- Safe, user-facing trait definitions (e.g. [`api::VecZnxAddIntoBackend`],
44//! [`api::VmpApplyDftToDft`]). Scheme authors program against these.
45//! 2. **[`oep`]** -- Unsafe extension-point layer of per-family backend traits.
46//! Backend crates implement only the families they own and may reuse helper
47//! macros or defaults where convenient.
48//! 3. **[`delegates`]** -- Blanket `impl` glue that connects each [`api`] trait to
49//! the corresponding backend family method on [`layouts::Module`].
50//! 4. **Reference implementations** live in the `poulpy-cpu-ref` crate, which provides
51//! the portable default backend used by tests and benchmarks.
52//!
53//! ## Testing and Benchmarking
54//!
55//! The [`test_suite`] module provides fully generic, backend-parametric test
56//! functions. Backend crates instantiate these via the
57//! [`backend_test_suite!`](crate::backend_test_suite) and
58//! [`cross_backend_test_suite!`](crate::cross_backend_test_suite) macros to
59//! validate correctness against the reference implementation in
60//! [`poulpy-cpu-ref`](https://docs.rs/poulpy-cpu-ref).
61//!
62//! Analogous Criterion-based benchmark harnesses live in the separate
63//! [`poulpy-bench`](https://docs.rs/poulpy-bench) crate.
64//!
65//! ## Safety Contract
66//!
67//! All [`oep`] extension points are `unsafe` to implement. Implementors must uphold the
68//! contract documented in [`doc::backend_safety`], covering memory domains,
69//! alignment, scratch lifetime, synchronization, aliasing, and numerical
70//! exactness.
71//!
72//! ## Non-Goals
73//!
74//! - This crate does **not** provide a complete cryptographic scheme. It is a
75//! low-level arithmetic layer consumed by higher-level crates such as
76//! `poulpy-core` and `poulpy-bin-fhe`.
77//! - It does **not** perform constant-time enforcement. Side-channel resistance
78//! is the responsibility of the backend and the caller.
79//!
80//! ## Compatibility
81//!
82//! - Requires **nightly** Rust (uses `#![feature(trait_alias)]`).
83//! - All memory allocations are aligned to [`DEFAULTALIGN`] (64 bytes).
84//! - Types matching the API of **spqlios-arithmetic**.
85
86#![allow(non_camel_case_types, non_snake_case, non_upper_case_globals, dead_code, improper_ctypes)]
87#![deny(rustdoc::broken_intra_doc_links)]
88#![cfg_attr(docsrs, feature(doc_cfg))]
89#![feature(trait_alias)]
90
91/// Safe, user-facing trait definitions for polynomial arithmetic operations.
92///
93/// Scheme authors program against these traits; the actual computation is
94/// dispatched to a backend via the [`oep`] extension points.
95pub mod api;
96
97/// Criterion-based benchmark harnesses, generic over any backend.
98/// Blanket implementations connecting [`api`] traits to [`oep`] traits on
99/// [`layouts::Module`].
100///
101/// This module contains no user-facing logic; it exists solely to wire
102/// the safe API layer to the unsafe backend implementations.
103pub mod delegates;
104
105/// Backend-agnostic data layout types for polynomials, vectors, matrices,
106/// and prepared (DFT-domain) representations.
107///
108/// All types are generic over a data container `D` (`Vec<u8>`, `&[u8]`,
109/// `&mut [u8]`) enabling owned, borrowed, and scratch-backed usage.
110pub mod layouts;
111
112/// Open Extension Points: the `unsafe` backend extension layer of per-family
113/// backend traits.
114///
115/// Backend crates implement only the families they own and may delegate to
116/// helper defaults provided by a backend crate (for example `poulpy-cpu-ref`). See
117/// [`doc::backend_safety`] for the safety contract.
118pub mod oep;
119
120/// Deterministic pseudorandom number generation based on ChaCha8.
121pub mod source;
122
123/// Fully generic, backend-parametric test functions.
124///
125/// Backend crates instantiate these via the [`backend_test_suite!`] and
126/// [`cross_backend_test_suite!`] macros.
127pub mod test_suite;
128
129/// Embedded safety contract documentation for backend implementors.
130pub mod doc {
131 /// Safety contract that all [`crate::oep`] trait implementations must uphold.
132 ///
133 /// Covers memory domains, alignment, scratch lifetime, synchronization,
134 /// aliasing, and numerical exactness requirements.
135 #[doc = include_str!(concat!(env!("CARGO_MANIFEST_DIR"), "/docs/backend_safety_contract.md"))]
136 pub mod backend_safety {
137 pub const _PLACEHOLDER: () = ();
138 }
139}
140
141/// Default generator of the Galois group `(Z/2NZ)*` for the cyclotomic ring
142/// `Z[X]/(X^N + 1)`.
143///
144/// Used to compute Galois automorphisms `X -> X^{5^k}` and their inverses.
145pub const GALOISGENERATOR: u64 = 5;
146
147/// Default memory alignment in bytes for all allocated buffers.
148///
149/// Set to 64 bytes to match the cache-line size of modern x86 processors
150/// and the alignment required by AVX-512 instructions.
151pub const DEFAULTALIGN: usize = 64;
152
153fn is_aligned_custom<T>(ptr: *const T, align: usize) -> bool {
154 (ptr as usize).is_multiple_of(align)
155}
156
157/// Returns `true` if `ptr` is aligned to [`DEFAULTALIGN`] bytes.
158pub fn is_aligned<T>(ptr: *const T) -> bool {
159 is_aligned_custom(ptr, DEFAULTALIGN)
160}
161
162/// Panics if `ptr` is not aligned to [`DEFAULTALIGN`] bytes.
163///
164/// # Panics
165///
166/// Panics with a descriptive message when the pointer does not satisfy the
167/// default alignment requirement.
168pub fn assert_alignment<T>(ptr: *const T) {
169 assert!(
170 is_aligned(ptr),
171 "invalid alignment: ensure passed bytes have been allocated with [alloc_aligned_u8] or [alloc_aligned]"
172 )
173}
174
175/// Deprecated spelling variant. Use [`assert_alignment`] instead.
176#[inline]
177pub fn assert_alignement<T>(ptr: *const T) {
178 assert_alignment(ptr)
179}
180
181/// Reinterprets a `&[T]` as a `&[V]`.
182///
183/// # Safety (via assertions)
184/// - `V` must not be zero-sized.
185/// - The pointer must be aligned for `V`.
186/// - The total byte length must be a multiple of `size_of::<V>()`.
187pub fn cast<T, V>(data: &[T]) -> &[V] {
188 assert!(size_of::<V>() > 0, "cast: target type V must not be zero-sized");
189 let byte_len: usize = std::mem::size_of_val(data);
190 assert!(
191 byte_len.is_multiple_of(size_of::<V>()),
192 "cast: byte length {} is not a multiple of target size {}",
193 byte_len,
194 size_of::<V>()
195 );
196 let ptr: *const V = data.as_ptr() as *const V;
197 assert!(
198 ptr.align_offset(align_of::<V>()) == 0,
199 "cast: pointer {:p} is not aligned to {} bytes",
200 ptr,
201 align_of::<V>()
202 );
203 let len: usize = byte_len / size_of::<V>();
204 unsafe { std::slice::from_raw_parts(ptr, len) }
205}
206
207/// Reinterprets a `&mut [T]` as a `&mut [V]`.
208///
209/// # Safety (via assertions)
210/// - `V` must not be zero-sized.
211/// - The pointer must be aligned for `V`.
212/// - The total byte length must be a multiple of `size_of::<V>()`.
213pub fn cast_mut<T, V>(data: &mut [T]) -> &mut [V] {
214 assert!(size_of::<V>() > 0, "cast_mut: target type V must not be zero-sized");
215 let byte_len: usize = std::mem::size_of_val(data);
216 assert!(
217 byte_len.is_multiple_of(size_of::<V>()),
218 "cast_mut: byte length {} is not a multiple of target size {}",
219 byte_len,
220 size_of::<V>()
221 );
222 let ptr: *mut V = data.as_mut_ptr() as *mut V;
223 assert!(
224 ptr.align_offset(align_of::<V>()) == 0,
225 "cast_mut: pointer {:p} is not aligned to {} bytes",
226 ptr,
227 align_of::<V>()
228 );
229 let len: usize = byte_len / size_of::<V>();
230 unsafe { std::slice::from_raw_parts_mut(ptr, len) }
231}
232
233/// Minimum allocation size for which the aligned allocator advises
234/// transparent huge pages. Overridable via `POULPY_HUGEPAGE_MIN_BYTES`.
235const HUGEPAGE_ADVISE_THRESHOLD: usize = 2 * 1024 * 1024;
236
237#[cfg(target_os = "linux")]
238fn hugepage_min_bytes() -> usize {
239 use once_cell::sync::Lazy;
240 static THRESH: Lazy<usize> = Lazy::new(|| {
241 std::env::var("POULPY_HUGEPAGE_MIN_BYTES")
242 .ok()
243 .and_then(|v| v.trim().parse().ok())
244 .unwrap_or(HUGEPAGE_ADVISE_THRESHOLD)
245 });
246 *THRESH
247}
248
249/// `madvise(MADV_HUGEPAGE)` on a freshly-allocated range. Skipped if the
250/// pointer is not page-aligned (non-mmap'd heap arenas) or the size is
251/// below the threshold. Failure is silently ignored — advisory only.
252#[cfg(target_os = "linux")]
253fn advise_hugepage(ptr: *mut u8, size: usize) {
254 if size < hugepage_min_bytes() {
255 return;
256 }
257 let page_size = unsafe { libc::sysconf(libc::_SC_PAGESIZE) };
258 if page_size <= 0 {
259 return;
260 }
261 let page_size = page_size as usize;
262 if !(ptr as usize).is_multiple_of(page_size) {
263 return;
264 }
265 let len = size - (size % page_size);
266 if len == 0 {
267 return;
268 }
269 let _ = unsafe { libc::madvise(ptr.cast(), len, libc::MADV_HUGEPAGE) };
270}
271
272#[cfg(not(target_os = "linux"))]
273#[inline(always)]
274fn advise_hugepage(_ptr: *mut u8, _size: usize) {}
275
276/// Allocates a block of bytes with a custom alignment.
277/// Alignment must be a power of two and size a multiple of the alignment.
278/// Allocated memory is initialized to zero.
279///
280/// Large allocations are advised for transparent huge pages via
281/// [`advise_hugepage`] on Linux before the zero-fill.
282///
283/// # Known issue (CRITICAL-2)
284/// The returned `Vec<u8>` was allocated with custom alignment via `std::alloc::alloc`,
285/// but `Vec::drop` will call `std::alloc::dealloc` with `align_of::<u8>() = 1`.
286/// This is technically UB per the `GlobalAlloc` contract (mismatched layout).
287/// In practice it works on all major allocators (glibc, jemalloc, mimalloc) because
288/// they ignore the alignment parameter during deallocation. A proper fix requires
289/// replacing `Vec<u8>` with a custom `AlignedBuf` type that tracks the layout.
290fn alloc_aligned_custom_u8(size: usize, align: usize) -> Vec<u8> {
291 assert!(align.is_power_of_two(), "Alignment must be a power of two but is {align}");
292 assert_eq!(
293 (size * size_of::<u8>()) % align,
294 0,
295 "size={size} must be a multiple of align={align}"
296 );
297 unsafe {
298 let layout: std::alloc::Layout = std::alloc::Layout::from_size_align(size, align).expect("Invalid alignment");
299 let ptr: *mut u8 = std::alloc::alloc(layout);
300 if ptr.is_null() {
301 panic!("Memory allocation failed");
302 }
303 assert!(
304 is_aligned_custom(ptr, align),
305 "Memory allocation at {ptr:p} is not aligned to {align} bytes"
306 );
307 // Advise before write_bytes so the zero-fill faults materialise
308 // 2 MB pages directly rather than relying on khugepaged promotion.
309 advise_hugepage(ptr, size);
310 std::ptr::write_bytes(ptr, 0, size);
311 Vec::from_raw_parts(ptr, size, size)
312 }
313}
314
315/// Allocates a zero-initialized `Vec<T>` with custom alignment.
316///
317/// The total byte size (`size * size_of::<T>()`) must be a multiple of `align`,
318/// and `align` must be a power of two.
319///
320/// # Panics
321///
322/// - If `T` is zero-sized.
323/// - If `align` is not a power of two.
324/// - If `size * size_of::<T>()` is not a multiple of `align`.
325pub fn alloc_aligned_custom<T>(size: usize, align: usize) -> Vec<T> {
326 assert!(size_of::<T>() > 0, "alloc_aligned_custom: zero-sized types are not supported");
327 assert!(align.is_power_of_two(), "Alignment must be a power of two but is {align}");
328
329 assert_eq!(
330 (size * size_of::<T>()) % align,
331 0,
332 "size*size_of::<T>()={} must be a multiple of align={align}",
333 size * size_of::<T>(),
334 );
335
336 let mut vec_u8: Vec<u8> = alloc_aligned_custom_u8(size_of::<T>() * size, align);
337 let ptr: *mut T = vec_u8.as_mut_ptr() as *mut T;
338 let len: usize = vec_u8.len() / size_of::<T>();
339 let cap: usize = vec_u8.capacity() / size_of::<T>();
340 std::mem::forget(vec_u8);
341 unsafe { Vec::from_raw_parts(ptr, len, cap) }
342}
343
344/// Allocates a zero-initialized `Vec<T>` aligned to [`DEFAULTALIGN`] bytes.
345///
346/// The allocation is padded so that the total byte size is a multiple of
347/// [`DEFAULTALIGN`]. This is the primary allocation entry point for all
348/// layout types in the crate.
349///
350/// # Panics
351///
352/// Panics if `T` is zero-sized.
353pub fn alloc_aligned<T>(size: usize) -> Vec<T> {
354 alloc_aligned_custom::<T>(
355 (size * size_of::<T>()).next_multiple_of(DEFAULTALIGN) / size_of::<T>(),
356 DEFAULTALIGN,
357 )
358}