1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
// Copyright (C) 2024-2026 Vaea SAS
// SPDX-License-Identifier: AGPL-3.0-or-later
//
// This file is part of VaeaNTT.
//
// VaeaNTT is free software: you can redistribute it and/or modify it under
// the terms of the GNU Affero General Public License as published by the
// Free Software Foundation, either version 3 of the License, or (at your
// option) any later version.
//
// VaeaNTT is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public
// License for more details.
//
// You should have received a copy of the GNU Affero General Public License
// along with VaeaNTT. If not, see <https://www.gnu.org/licenses/>.
//! # VaeaNTT — High-Performance Number Theoretic Transforms
//!
//! **VaeaNTT** is a production-grade NTT engine for lattice-based cryptography
//! and Fully Homomorphic Encryption (FHE), optimized for ARM NEON (aarch64)
//! with a portable scalar fallback.
//!
//! ## What is NTT and why does it matter?
//!
//! The **Number Theoretic Transform** is the finite-field analogue of the FFT.
//! It maps a polynomial from coefficient representation to evaluation
//! representation in O(N log N) over Z_q, where q is a prime modulus. In
//! the evaluation domain, polynomial multiplication becomes pointwise — O(N)
//! instead of the naïve O(N²). This makes NTT the critical hot-path in:
//!
//! - **Post-quantum cryptography** — ML-DSA (formerly Dilithium)
//! and other NIST lattice standards multiply polynomials in the ring
//! Z_q\[X\]/(X^N+1) via NTT.
//! - **Fully Homomorphic Encryption (FHE)** — CKKS, BFV, and BGV schemes
//! (SEAL, OpenFHE) rely on multi-prime RNS-NTT with 60–62 bit primes.
//!
//! VaeaNTT covers both use-cases with a single engine: 28-bit NTT for
//! post-quantum signatures and 64-bit NTT for FHE workloads.
//!
//! ## Quick Start
//!
//! ```
//! use vaea_ntt::ntt32::Ntt32Context;
//!
//! // Any NTT-friendly prime < 2^28
//! let ctx = Ntt32Context::new(256, 8_380_417);
//!
//! let mut data = vec![42u32; 256];
//! ctx.forward(&mut data);
//! ctx.inverse(&mut data);
//! assert!(data.iter().all(|&x| x == 42));
//! ```
//!
//! ## Architecture Overview
//!
//! | Module | Pipeline | Description |
//! |--------|----------|-------------|
//! | [`ntt32`] | 28-bit primes (< 2²⁸) | ARM NEON vectorized butterfly (Shoup + Harvey lazy reduction). Scalar fallback on non-aarch64 targets. |
//! | [`ntt64`] | 60–62 bit primes | Barrett reduction. SEAL/OpenFHE-compatible. Cooley-Tukey forward, Gentleman-Sande inverse. |
//! | [`poly`] | Polynomial ring | Polynomials over Z_q\[X\]/(X^N+1) with 64-bit coefficients. Tracks coefficient/NTT domain. |
//! | [`rns`] | Multi-prime CRT | Residue Number System decomposition for FHE. Component-wise NTT on each limb. |
//! | [`pq`] | NIST presets | One-line constructors for ML-DSA-44/65/87. |
//!
//! ## Negacyclic Polynomial Multiplication (ntt32)
//!
//! Multiply two polynomials in Z_q\[X\]/(X^N+1) using the 28-bit pipeline:
//!
//! ```
//! use vaea_ntt::ntt32::{Ntt32Context, generate_primes_28};
//!
//! // Generate an NTT-friendly prime for N = 256
//! let primes = generate_primes_28(256, 1);
//! let q = primes[0];
//! let ctx = Ntt32Context::new(256, q);
//!
//! // a(X) = 1 + 2X
//! let mut a = vec![0u32; 256];
//! a[0] = 1;
//! a[1] = 2;
//!
//! // b(X) = 3 + X
//! let mut b = vec![0u32; 256];
//! b[0] = 3;
//! b[1] = 1;
//!
//! // c(X) = a(X) · b(X) mod (X^256 + 1, q) = 3 + 7X + 2X²
//! let c = ctx.negacyclic_mul(&a, &b);
//! assert_eq!(c[0], 3);
//! assert_eq!(c[1], 7);
//! assert_eq!(c[2], 2);
//! for i in 3..256 {
//! assert_eq!(c[i], 0);
//! }
//! ```
//!
//! ## 64-bit Pipeline (FHE)
//!
//! For FHE workloads requiring 60–62 bit primes, use [`ntt64::Ntt64Arith`]
//! and [`ntt64::Ntt64Context`]:
//!
//! ```
//! use vaea_ntt::ntt64::{Ntt64Arith, Ntt64Context, generate_primes_60};
//!
//! // Generate a 60-bit NTT-friendly prime for N = 4096
//! let primes = generate_primes_60(4096, 60, 1);
//! let arith = Ntt64Arith::new(primes[0]);
//! let ctx = Ntt64Context::new(4096, arith);
//!
//! // Forward + inverse roundtrip
//! let mut data = vec![0u64; 4096];
//! data[0] = 42;
//! data[1] = 100;
//! let original = data.clone();
//!
//! ctx.forward(&mut data);
//! assert_ne!(data, original); // now in NTT domain
//! ctx.inverse(&mut data);
//! assert_eq!(data, original); // roundtrip restored
//! ```
//!
//! ## Post-Quantum Presets
//!
//! Zero-configuration NTT for NIST post-quantum standards:
//!
//! ```
//! use vaea_ntt::pq::{PqScheme, PqNtt};
//!
//! // ML-DSA-65 — digital signatures, NIST Level 3
//! let ntt = PqNtt::new(PqScheme::MlDsa65);
//! assert_eq!(ntt.n(), 256);
//! assert_eq!(ntt.q(), 8_380_417);
//! assert_eq!(ntt.security_level(), 3);
//!
//! // Multiply two polynomials
//! let mut a = vec![0u32; 256];
//! a[0] = 5;
//! let mut b = vec![0u32; 256];
//! b[0] = 7;
//! let c = ntt.multiply(&a, &b);
//! assert_eq!(c[0], 35);
//! ```
//!
//! ## Modules
//!
//! | Module | Use case |
//! |--------|----------|
//! | [`pq`] | Post-quantum presets for ML-DSA |
//! | [`ntt32`] | 28-bit primes (< 2²⁸), ARM NEON vectorized |
//! | [`ntt64`] | 60–62 bit primes for FHE (SEAL/OpenFHE compatible) |
//! | [`poly`] | Polynomials over Z_q\[X\]/(X^N+1) with 64-bit coefficients |
//! | [`rns`] | Multi-prime CRT (Residue Number System) |
//!
//! ## Performance
//!
//! Measured on Apple M3 Pro (single core), `ntt32` pipeline:
//!
//! | Operation | N = 256 | Throughput |
//! |-----------|---------|------------|
//! | Forward NTT | 234 ns | 1.09 billion coeff/s |
//! | Negacyclic multiply | 1.08 µs | — |
//!
//! The `ntt32` pipeline uses the Shoup precomputed-quotient trick with
//! Harvey lazy butterfly reductions. On aarch64, all butterfly stages are
//! NEON-vectorized (4×u32 lanes). The scalar fallback is used on other
//! architectures.
//!
//! ## Security
//!
//! All arithmetic and butterfly routines are **constant-time**:
//!
//! - **No data-dependent branches** — modular reductions use branchless
//! wrapping arithmetic ([`u32::wrapping_add`] / [`u32::wrapping_sub`]).
//! - **No secret-dependent memory access patterns** — twiddle factor
//! indexing depends only on the public transform size N.
//! - Safe to use on secret polynomial coefficients (e.g. ML-DSA signing keys).
//!
//! ## Features
//!
//! | Feature | Default | Description |
//! |---------|---------|-------------|
//! | `std` | **on** | Enables [`std::error::Error`] impl on [`NttError`] |
//! | `rand` | off | Random polynomial generation |
//! | `ffi` | off | C/C++/JS bindings via Diplomat |
//!
//! ## `no_std`
//!
//! This crate is `no_std` compatible (requires `alloc`).
//! Disable default features to use without `std`.
//!
//! ## License
//!
//! VaeaNTT is dual-licensed:
//!
//! - **AGPL-3.0-or-later** for open-source use.
//! - **Commercial license** available from [Vaea SAS](https://vaea.tech)
//! for proprietary / closed-source integration.
extern crate alloc;
extern crate std;
/// Errors returned by NTT context construction.