Skip to main content

Crate vaea_ntt

Crate vaea_ntt 

Source
Expand description

§VaeaNTT — High-Performance Number Theoretic Transforms

VaeaNTT is a production-grade NTT engine for lattice-based cryptography and Fully Homomorphic Encryption (FHE), optimized for ARM NEON (aarch64) with a portable scalar fallback.

§What is NTT and why does it matter?

The Number Theoretic Transform is the finite-field analogue of the FFT. It maps a polynomial from coefficient representation to evaluation representation in O(N log N) over Z_q, where q is a prime modulus. In the evaluation domain, polynomial multiplication becomes pointwise — O(N) instead of the naïve O(N²). This makes NTT the critical hot-path in:

  • Post-quantum cryptography — ML-DSA (formerly Dilithium) and other NIST lattice standards multiply polynomials in the ring Z_q[X]/(X^N+1) via NTT.
  • Fully Homomorphic Encryption (FHE) — CKKS, BFV, and BGV schemes (SEAL, OpenFHE) rely on multi-prime RNS-NTT with 60–62 bit primes.

VaeaNTT covers both use-cases with a single engine: 28-bit NTT for post-quantum signatures and 64-bit NTT for FHE workloads.

§Quick Start

use vaea_ntt::ntt32::Ntt32Context;

// Any NTT-friendly prime < 2^28
let ctx = Ntt32Context::new(256, 8_380_417);

let mut data = vec![42u32; 256];
ctx.forward(&mut data);
ctx.inverse(&mut data);
assert!(data.iter().all(|&x| x == 42));

§Architecture Overview

ModulePipelineDescription
ntt3228-bit primes (< 2²⁸)ARM NEON vectorized butterfly (Shoup + Harvey lazy reduction). Scalar fallback on non-aarch64 targets.
ntt6460–62 bit primesBarrett reduction. SEAL/OpenFHE-compatible. Cooley-Tukey forward, Gentleman-Sande inverse.
polyPolynomial ringPolynomials over Z_q[X]/(X^N+1) with 64-bit coefficients. Tracks coefficient/NTT domain.
rnsMulti-prime CRTResidue Number System decomposition for FHE. Component-wise NTT on each limb.
pqNIST presetsOne-line constructors for ML-DSA-44/65/87.

§Negacyclic Polynomial Multiplication (ntt32)

Multiply two polynomials in Z_q[X]/(X^N+1) using the 28-bit pipeline:

use vaea_ntt::ntt32::{Ntt32Context, generate_primes_28};

// Generate an NTT-friendly prime for N = 256
let primes = generate_primes_28(256, 1);
let q = primes[0];
let ctx = Ntt32Context::new(256, q);

// a(X) = 1 + 2X
let mut a = vec![0u32; 256];
a[0] = 1;
a[1] = 2;

// b(X) = 3 + X
let mut b = vec![0u32; 256];
b[0] = 3;
b[1] = 1;

// c(X) = a(X) · b(X) mod (X^256 + 1, q) = 3 + 7X + 2X²
let c = ctx.negacyclic_mul(&a, &b);
assert_eq!(c[0], 3);
assert_eq!(c[1], 7);
assert_eq!(c[2], 2);
for i in 3..256 {
    assert_eq!(c[i], 0);
}

§64-bit Pipeline (FHE)

For FHE workloads requiring 60–62 bit primes, use ntt64::Ntt64Arith and ntt64::Ntt64Context:

use vaea_ntt::ntt64::{Ntt64Arith, Ntt64Context, generate_primes_60};

// Generate a 60-bit NTT-friendly prime for N = 4096
let primes = generate_primes_60(4096, 60, 1);
let arith = Ntt64Arith::new(primes[0]);
let ctx = Ntt64Context::new(4096, arith);

// Forward + inverse roundtrip
let mut data = vec![0u64; 4096];
data[0] = 42;
data[1] = 100;
let original = data.clone();

ctx.forward(&mut data);
assert_ne!(data, original); // now in NTT domain
ctx.inverse(&mut data);
assert_eq!(data, original); // roundtrip restored

§Post-Quantum Presets

Zero-configuration NTT for NIST post-quantum standards:

use vaea_ntt::pq::{PqScheme, PqNtt};

// ML-DSA-65 — digital signatures, NIST Level 3
let ntt = PqNtt::new(PqScheme::MlDsa65);
assert_eq!(ntt.n(), 256);
assert_eq!(ntt.q(), 8_380_417);
assert_eq!(ntt.security_level(), 3);

// Multiply two polynomials
let mut a = vec![0u32; 256];
a[0] = 5;
let mut b = vec![0u32; 256];
b[0] = 7;
let c = ntt.multiply(&a, &b);
assert_eq!(c[0], 35);

§Modules

ModuleUse case
pqPost-quantum presets for ML-DSA
ntt3228-bit primes (< 2²⁸), ARM NEON vectorized
ntt6460–62 bit primes for FHE (SEAL/OpenFHE compatible)
polyPolynomials over Z_q[X]/(X^N+1) with 64-bit coefficients
rnsMulti-prime CRT (Residue Number System)

§Performance

Measured on Apple M3 Pro (single core), ntt32 pipeline:

OperationN = 256Throughput
Forward NTT234 ns1.09 billion coeff/s
Negacyclic multiply1.08 µs

The ntt32 pipeline uses the Shoup precomputed-quotient trick with Harvey lazy butterfly reductions. On aarch64, all butterfly stages are NEON-vectorized (4×u32 lanes). The scalar fallback is used on other architectures.

§Security

All arithmetic and butterfly routines are constant-time:

  • No data-dependent branches — modular reductions use branchless wrapping arithmetic (u32::wrapping_add / u32::wrapping_sub).
  • No secret-dependent memory access patterns — twiddle factor indexing depends only on the public transform size N.
  • Safe to use on secret polynomial coefficients (e.g. ML-DSA signing keys).

§Features

FeatureDefaultDescription
stdonEnables std::error::Error impl on NttError
randoffRandom polynomial generation
ffioffC/C++/JS bindings via Diplomat

§no_std

This crate is no_std compatible (requires alloc). Disable default features to use without std.

§License

VaeaNTT is dual-licensed:

  • AGPL-3.0-or-later for open-source use.
  • Commercial license available from Vaea SAS for proprietary / closed-source integration.

Modules§

ntt32
ntt32 — 28-bit NTT Pipeline
ntt64
64-bit NTT Pipeline
poly
Polynomial over Z_q[X]/(X^N + 1)
pq
Post-Quantum Cryptography Presets
rns
Residue Number System (RNS) — Multi-Moduli Decomposition

Enums§

NttError
Errors returned by NTT context construction.