1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
//! # poulpy-hal
//!
//! A trait-based Hardware Abstraction Layer (HAL) for lattice-based polynomial
//! arithmetic over the cyclotomic ring `Z[X]/(X^N + 1)`.
//!
//! This crate provides backend-agnostic data layouts and a trait-based API for
//! polynomial operations commonly used in lattice-based cryptography (LWE/Module-LWE
//! ciphertexts, key-switching matrices, external products, etc.). It is designed
//! so that cryptographic schemes can be written once against the [`api`] traits and
//! then executed on any backend (CPU with AVX2/AVX-512, GPU, FPGA, ...) that
//! implements the [`oep`] (Open Extension Point) traits.
//!
//! ## Core Concepts
//!
//! **Ring:** All polynomials live in `Z[X]/(X^N + 1)` where `N` is a power of
//! two (the *ring degree*). A [`layouts::Module`] encapsulates `N` together with
//! an optional backend-specific handle (e.g. precomputed FFT twiddle factors).
//!
//! **Limbed representation (base-2^k):** Large coefficients are decomposed into
//! a vector of `size` limbs, each carrying at most `base2k` bits. This is the
//! *bivariate* view `Z[X, Y]` with `Y = 2^{-k}`, central to gadget
//! decomposition and normalization.
//!
//! **Layout types** ([`layouts`]):
//! - [`layouts::ScalarZnx`] -- single polynomial with `i64` coefficients.
//! - [`layouts::VecZnx`] -- vector of `cols` polynomials, each with `size` limbs.
//! - [`layouts::MatZnx`] -- matrix of polynomials (`rows x cols_in`, each entry a [`layouts::VecZnx`] of `cols_out` polynomials).
//! - [`layouts::VecZnxBig`] -- vector of polynomials with backend-specific large-coefficient scalars (result accumulator).
//! - [`layouts::VecZnxDft`] -- vector of polynomials in DFT/NTT domain (backend-specific prepared scalars).
//! - [`layouts::SvpPPol`] -- prepared scalar polynomial for scalar-vector products.
//! - [`layouts::VmpPMat`] -- prepared matrix for vector-matrix products.
//! - [`layouts::CnvPVecL`], [`layouts::CnvPVecR`] -- prepared left/right operands for bivariate convolution.
//! - [`layouts::ScratchArena`], [`layouts::ScratchOwned`] -- aligned scratch memory for temporary workspace.
//!
//! All layout types are generic over a data container `D` (owned `Vec<u8>`, borrowed
//! `&[u8]` / `&mut [u8]`), enabling zero-copy views and arena-style allocation via
//! [`layouts::ScratchArena`].
//!
//! ## Architecture
//!
//! The crate is organized into a four-layer stack:
//!
//! 1. **[`api`]** -- Safe, user-facing trait definitions (e.g. [`api::VecZnxAddIntoBackend`],
//! [`api::VmpApplyDftToDft`]). Scheme authors program against these.
//! 2. **[`oep`]** -- Unsafe extension-point layer of per-family backend traits.
//! Backend crates implement only the families they own and may reuse helper
//! macros or defaults where convenient.
//! 3. **[`delegates`]** -- Blanket `impl` glue that connects each [`api`] trait to
//! the corresponding backend family method on [`layouts::Module`].
//! 4. **Reference implementations** live in the `poulpy-cpu-ref` crate, which provides
//! the portable default backend used by tests and benchmarks.
//!
//! ## Testing and Benchmarking
//!
//! The [`test_suite`] module provides fully generic, backend-parametric test
//! functions. Backend crates instantiate these via the
//! [`backend_test_suite!`](crate::backend_test_suite) and
//! [`cross_backend_test_suite!`](crate::cross_backend_test_suite) macros to
//! validate correctness against the reference implementation in
//! [`poulpy-cpu-ref`](https://docs.rs/poulpy-cpu-ref).
//!
//! Analogous Criterion-based benchmark harnesses live in the separate
//! [`poulpy-bench`](https://docs.rs/poulpy-bench) crate.
//!
//! ## Safety Contract
//!
//! All [`oep`] extension points are `unsafe` to implement. Implementors must uphold the
//! contract documented in [`doc::backend_safety`], covering memory domains,
//! alignment, scratch lifetime, synchronization, aliasing, and numerical
//! exactness.
//!
//! ## Non-Goals
//!
//! - This crate does **not** provide a complete cryptographic scheme. It is a
//! low-level arithmetic layer consumed by higher-level crates such as
//! `poulpy-core` and `poulpy-bin-fhe`.
//! - It does **not** perform constant-time enforcement. Side-channel resistance
//! is the responsibility of the backend and the caller.
//!
//! ## Compatibility
//!
//! - Requires **nightly** Rust (uses `#![feature(trait_alias)]`).
//! - All memory allocations are aligned to [`DEFAULTALIGN`] (64 bytes).
//! - Types matching the API of **spqlios-arithmetic**.
/// Safe, user-facing trait definitions for polynomial arithmetic operations.
///
/// Scheme authors program against these traits; the actual computation is
/// dispatched to a backend via the [`oep`] extension points.
/// Criterion-based benchmark harnesses, generic over any backend.
/// Blanket implementations connecting [`api`] traits to [`oep`] traits on
/// [`layouts::Module`].
///
/// This module contains no user-facing logic; it exists solely to wire
/// the safe API layer to the unsafe backend implementations.
/// Backend-agnostic data layout types for polynomials, vectors, matrices,
/// and prepared (DFT-domain) representations.
///
/// All types are generic over a data container `D` (`Vec<u8>`, `&[u8]`,
/// `&mut [u8]`) enabling owned, borrowed, and scratch-backed usage.
/// Open Extension Points: the `unsafe` backend extension layer of per-family
/// backend traits.
///
/// Backend crates implement only the families they own and may delegate to
/// helper defaults provided by a backend crate (for example `poulpy-cpu-ref`). See
/// [`doc::backend_safety`] for the safety contract.
/// Deterministic pseudorandom number generation based on ChaCha8.
/// Fully generic, backend-parametric test functions.
///
/// Backend crates instantiate these via the [`backend_test_suite!`] and
/// [`cross_backend_test_suite!`] macros.
/// Embedded safety contract documentation for backend implementors.
/// Default generator of the Galois group `(Z/2NZ)*` for the cyclotomic ring
/// `Z[X]/(X^N + 1)`.
///
/// Used to compute Galois automorphisms `X -> X^{5^k}` and their inverses.
pub const GALOISGENERATOR: u64 = 5;
/// Default memory alignment in bytes for all allocated buffers.
///
/// Set to 64 bytes to match the cache-line size of modern x86 processors
/// and the alignment required by AVX-512 instructions.
pub const DEFAULTALIGN: usize = 64;
/// Returns `true` if `ptr` is aligned to [`DEFAULTALIGN`] bytes.
/// Panics if `ptr` is not aligned to [`DEFAULTALIGN`] bytes.
///
/// # Panics
///
/// Panics with a descriptive message when the pointer does not satisfy the
/// default alignment requirement.
/// Deprecated spelling variant. Use [`assert_alignment`] instead.
/// Reinterprets a `&[T]` as a `&[V]`.
///
/// # Safety (via assertions)
/// - `V` must not be zero-sized.
/// - The pointer must be aligned for `V`.
/// - The total byte length must be a multiple of `size_of::<V>()`.
/// Reinterprets a `&mut [T]` as a `&mut [V]`.
///
/// # Safety (via assertions)
/// - `V` must not be zero-sized.
/// - The pointer must be aligned for `V`.
/// - The total byte length must be a multiple of `size_of::<V>()`.
/// Minimum allocation size for which the aligned allocator advises
/// transparent huge pages. Overridable via `POULPY_HUGEPAGE_MIN_BYTES`.
const HUGEPAGE_ADVISE_THRESHOLD: usize = 2 * 1024 * 1024;
/// `madvise(MADV_HUGEPAGE)` on a freshly-allocated range. Skipped if the
/// pointer is not page-aligned (non-mmap'd heap arenas) or the size is
/// below the threshold. Failure is silently ignored — advisory only.
/// Allocates a block of bytes with a custom alignment.
/// Alignment must be a power of two and size a multiple of the alignment.
/// Allocated memory is initialized to zero.
///
/// Large allocations are advised for transparent huge pages via
/// [`advise_hugepage`] on Linux before the zero-fill.
///
/// # Known issue (CRITICAL-2)
/// The returned `Vec<u8>` was allocated with custom alignment via `std::alloc::alloc`,
/// but `Vec::drop` will call `std::alloc::dealloc` with `align_of::<u8>() = 1`.
/// This is technically UB per the `GlobalAlloc` contract (mismatched layout).
/// In practice it works on all major allocators (glibc, jemalloc, mimalloc) because
/// they ignore the alignment parameter during deallocation. A proper fix requires
/// replacing `Vec<u8>` with a custom `AlignedBuf` type that tracks the layout.
/// Allocates a zero-initialized `Vec<T>` with custom alignment.
///
/// The total byte size (`size * size_of::<T>()`) must be a multiple of `align`,
/// and `align` must be a power of two.
///
/// # Panics
///
/// - If `T` is zero-sized.
/// - If `align` is not a power of two.
/// - If `size * size_of::<T>()` is not a multiple of `align`.
/// Allocates a zero-initialized `Vec<T>` aligned to [`DEFAULTALIGN`] bytes.
///
/// The allocation is padded so that the total byte size is a multiple of
/// [`DEFAULTALIGN`]. This is the primary allocation entry point for all
/// layout types in the crate.
///
/// # Panics
///
/// Panics if `T` is zero-sized.