Expand description
§ntt32 — 28-bit NTT Pipeline
High-performance Number Theoretic Transform for primes < 2^28.
§Architecture
| Module | Description |
|---|---|
arith | Branchless modular arithmetic (add, sub, mul, pow, inv) |
prime | NTT-friendly prime generation and primitive root finding |
scalar | Scalar NTT with Shoup trick + Harvey lazy butterfly |
neon | NEON-vectorized NTT (aarch64 only, all stages) |
context | Unified Ntt32Context with automatic NEON/scalar dispatch |
§Quick Start
use vaea_ntt::ntt32::{Ntt32Context, generate_primes_28};
let primes = generate_primes_28(1024, 1);
let ctx = Ntt32Context::new(1024, primes[0]);
let a = vec![1u32; 1024];
let b = vec![2u32; 1024];
let product = ctx.negacyclic_mul(&a, &b);
assert_eq!(product.len(), 1024);§Performance
Measured on Apple M3 Pro (single core):
| Operation | N = 256 | Throughput |
|---|---|---|
| Forward NTT | 234 ns | 1.09 billion coeff/s |
| Negacyclic multiply | 1.08 µs | — |
§Security
All operations are constant-time:
- No data-dependent branches in any arithmetic or butterfly routine.
- Modular reductions use branchless wrapping arithmetic
(
u32::wrapping_add/u32::wrapping_sub). - Safe to use on secret polynomial coefficients (e.g. ML-DSA keys).
§Primes
NTT-friendly primes must satisfy two constraints:
- Bit-width:
q < 2^28(required by the Shoup / Harvey reduction). - Divisibility:
q ≡ 1 (mod 2N)so that a principal 2N-th root of unity exists inZ_q.
Use generate_primes_28 to find valid primes for a given N:
use vaea_ntt::ntt32::generate_primes_28;
let primes = generate_primes_28(256, 3);
assert_eq!(primes.len(), 3);
for &q in &primes {
assert!(q < (1 << 28));
assert_eq!(q % (2 * 256), 1);
}The ML-DSA (Dilithium) prime 8 380 417 is NTT-friendly for N = 256
(8_380_417 % 512 == 1).
§Forward / Inverse Roundtrip
use vaea_ntt::ntt32::Ntt32Context;
let ctx = Ntt32Context::new(256, 8_380_417);
let mut data = vec![0u32; 256];
data[0] = 1;
data[1] = 2;
let original = data.clone();
ctx.forward(&mut data);
// data is now in NTT domain
assert_ne!(data, original);
ctx.inverse(&mut data);
// roundtrip: data restored
assert_eq!(data, original);Re-exports§
pub use arith::mod_add_28;pub use arith::mod_inv_32;pub use arith::mod_mul_28;pub use arith::mod_pow_32;pub use arith::mod_sub_28;pub use context::Ntt32Context;pub use prime::generate_primes_28;pub use prime::find_primitive_root;pub use prime::is_prime_32;pub use scalar::compute_shoup;pub use scalar::shoup_mul;