# Falcon (FN-DSA) — Mathematical Deep Dive
> **FIPS 206 · Post-Quantum Digital Signature Standard**
>
> This document explains *every mathematical step* of Falcon signing,
> from the underlying hardness assumption to the final verification
> equation. No prior lattice-cryptography background is assumed.
---
## Table of Contents
1. [The Big Picture — Why Falcon?](#1-the-big-picture--why-falcon)
2. [Mathematical Foundations](#2-mathematical-foundations)
3. [Key Generation](#3-key-generation)
4. [Signing — The "Hash-and-Sign" Paradigm](#4-signing--the-hash-and-sign-paradigm)
5. [Verification](#5-verification)
6. [Security: Why This Is Hard to Break](#6-security-why-this-is-hard-to-break)
7. [Concrete Parameters](#7-concrete-parameters)
---
## 1. The Big Picture — Why Falcon?
Traditional signatures (RSA, ECDSA) rely on number-theoretic problems that
a quantum computer can break with Shor's algorithm.
Falcon replaces them with a **lattice** problem:
> *Given a random lattice basis, find a short vector.*
>
> This is called the **Short Integer Solution (SIS)** problem —
> believed to be hard even for quantum computers.
### Mental Model
Think of a lattice as a 2-D grid of dots. The public key defines the grid.
The **private key** is a secret "shortcut" that lets you jump to a grid
point **near** a given target (the message hash). Without the shortcut,
finding a nearby point is computationally infeasible.
```
· · · · · · · ·
· · · · · · · ·
· · · ☆ · · · · ☆ = message hash point
· · · ● · · · · ● = nearest lattice point (signature)
· · · · · · · ·
· · · · · · · ·
```
---
## 2. Mathematical Foundations
### 2.1 The Polynomial Ring
All Falcon arithmetic lives in the **cyclotomic polynomial ring**:
$$
R_q = \mathbb{Z}_q[x] / (x^n + 1)
$$
- **n** = 512 or 1024 (power of 2)
- **q** = 12289 (a prime, chosen so q ≡ 1 mod 2n for efficient NTT)
- Elements are **polynomials** of degree < n with coefficients in {0, 1, ..., q−1}
- Multiplication wraps around: x^n is replaced by −1
**Example (n = 4, q = 5):**
```
Let a(x) = 1 + 2x + 3x² + 4x³
Let b(x) = 1 + x
a(x) · b(x) mod (x⁴ + 1, 5):
= 1 + 2x + 3x² + 4x³ + x + 2x² + 3x³ + 4x⁴
= 1 + 3x + 5x² + 7x³ + 4x⁴
↓ x⁴ → −1
= (1 − 4) + 3x + 0x² + 2x³ mod 5
= 2 + 3x + 0x² + 2x³
```
### 2.2 The NTRU Equation
At the heart of Falcon is the **NTRU** relation. Key generation produces four
small polynomials f, g, F, G ∈ R such that:
$$
f \cdot G - g \cdot F = q \pmod{x^n + 1}
$$
This is exactly the equation `fG − gF = q` in the ring R.
- **f, g** have very small coefficients (bounded by ~6)
- **F, G** have larger but still bounded coefficients
- The public key is **h = g · f⁻¹ mod q** (a single polynomial in R_q)
### 2.3 The Lattice
The four polynomials form a **2×2 matrix** of polynomials (the "NTRU basis"):
```
┌ ┐
B = │ g −f │ (each entry is a polynomial in R)
│ G −F │
└ ┘
```
This basis generates a **lattice** Λ in ℤ^2n:
```
Λ = { (s₁, s₂) ∈ R² : s₁ + s₂·h ≡ 0 (mod q) }
```
The key insight: **(f, g, F, G) are short** polynomials, so the rows of B
are short lattice vectors. This is the "trap door" — the private key.
An attacker only sees h (the public key), which defines the same lattice but
through a random-looking basis with **long** vectors. Finding short vectors in
that basis is the hard problem.
### 2.4 NTT — The Fast Multiplication Engine
The **Number Theoretic Transform** (NTT) is the "integer FFT". It converts
a polynomial from coefficient form to evaluation form, where multiplication
becomes pointwise:
```
Coefficient form NTT Evaluation form
┌─────────────┐ ──────────────► ┌──────────────────┐
│ a₀ a₁ … aₙ₋₁│ │ â₀ â₁ … âₙ₋₁ │
└─────────────┘ ◄────────────── └──────────────────┘
iNTT
Multiply in evaluation form: ĉᵢ = âᵢ · b̂ᵢ mod q
Cost: O(n) instead of O(n²)
```
**q = 12289** was chosen specifically because 12289 − 1 = 12288 = 2¹² × 3,
so primitive 2n-th roots of unity exist mod q, enabling radix-2 NTT of length
up to 4096.
In the code (`vrfy.rs`):
- **g = 7** is the primitive root used for the NTT tables
- Montgomery multiplication with R = 4091 speeds up modular reductions
---
## 3. Key Generation
```mermaid
flowchart TD
A["🎲 Random Seed"] --> B["Generate small f, g\n(Gaussian coefficients)"]
B --> C{"f invertible\nmod q?"}
C -- No --> B
C -- Yes --> D["Solve NTRU equation\nfG - gF = q\nfor F, G"]
D --> E{"F, G small\nenough?"}
E -- No --> B
E -- Yes --> F["Compute public key\nh = g/f mod q"]
F --> G["🔑 Private Key\n(f, g, F, G)"]
F --> H["🔓 Public Key\n(h)"]
```
### Step by Step
1. **Sample f, g**: Draw coefficients from a discrete Gaussian distribution
with small standard deviation σ. This gives polynomials with most
coefficients in {−1, 0, 1} and a few ±2, ±3.
2. **NTRU Solver**: Find F, G satisfying fG − gF = q using the
**Bézout-like algorithm** over polynomial rings. The code implements this
via tower-of-fields recursive solving plus Chinese Remainder Theorem (CRT)
reconstruction with big-integer arithmetic (`keygen.rs: zint_bezout`).
3. **Public key**: Compute h = g · f⁻¹ mod q in R_q. The inverse f⁻¹ mod q
exists because f has no roots among the n-th roots of unity mod q (checked via NTT: all evaluations non-zero).
In code (`vrfy.rs`):
```
h[u] = mq_div_12289(g_ntt[u], f_ntt[u]) for each NTT slot u
```
---
## 4. Signing — The "Hash-and-Sign" Paradigm
Falcon uses the **GPV framework** (Gentry, Peikert, Vaikuntanathan) for
hash-and-sign signatures over lattices.
### 4.1 High-Level Flow
```mermaid
flowchart TD
subgraph Input
M["📩 Message m"]
SK["🔑 Private Key\n(f, g, F, G)"]
end
M --> H["Hash to point\nc = H(nonce ‖ m)\n∈ ℤ_q^n"]
H --> T["Compute target\nt = (c/q) · B⁻¹\n∈ ℝ²ⁿ"]
T --> S["Discrete Gaussian\nSampling\nz ← D_Λ(t)\n(nearest lattice point)"]
S --> V{"‖(s₁, s₂)‖²\n≤ bound?"}
V -- No\nrestart --> H
V -- Yes --> SIG["✍️ Signature\nσ = (nonce, s₂)"]
style S fill:#f9f,stroke:#333,stroke-width:2px
```
### 4.2 Step 1: Hash to Point
The message m is hashed to a polynomial c ∈ R_q:
```
nonce ← random(40 bytes)
c(x) = HashToPoint( SHAKE256(nonce ‖ m) )
Each coefficient cᵢ ∈ {0, 1, ..., q−1} is extracted from the
SHAKE256 stream, rejecting values ≥ q for uniformity.
```
In the code, `hm[u]` is this hash-to-point output — a polynomial with
coefficients modulo q = 12289.
### 4.3 Step 2: Compute the Target Vector
The signer needs to find a lattice point near the "target" derived from c.
The target is computed by multiplying c by the **inverse basis** B⁻¹:
$$
t = \frac{1}{q} \cdot c \cdot \widetilde{B}
$$
where B̃ is the **Gram-Schmidt orthogonalization** of B (the private basis).
Concretely, the code computes two target polynomials t₀ and t₁:
```
t₀ = FFT(c) · FFT(b₁₁) / q ← multiply by column of B⁻¹
t₁ = FFT(c) · FFT(−b₀₁) / q ← other column
```
Here b₀₁ = −f and b₁₁ = −F come from the basis matrix B = [[g, −f], [G, −F]].
**In code** (`sign.rs: do_sign_tree`, lines 636-650):
```rust
// Convert hash to FFT
fft::fft(&mut tmp[0..n], logn);
let ni = FPR_INVERSE_OF_Q; // = 1/q
// t₀ = c · b₁₁ / q (b₁₁ = −F)
fft::poly_mul_fft(&mut tmp[0..n], &b11[..n], logn);
fft::poly_mulconst(&mut tmp[0..n], ni, logn);
// t₁ = c · (−b₀₁) / q (b₀₁ = −f, so −b₀₁ = f → multiply by −1)
fft::poly_mul_fft(&mut tmp[n..2*n], &b01[..n], logn);
fft::poly_mulconst(&mut tmp[n..2*n], fpr_neg(ni), logn);
```
### 4.4 Step 3: Discrete Gaussian Sampling (The Core Magic)
This is the heart of Falcon. Given the target (t₀, t₁), we need to sample
an integer vector (z₀, z₁) that is:
- **Close** to (t₀, t₁) — i.e., each zᵢ ≈ round(tᵢ)
- **Random** with a specific Gaussian distribution — so the signature
doesn't leak information about the private key
This uses the **Fast Fourier Sampling (FFS)** algorithm.
#### 4.4.1 The Gram Matrix and LDL Tree
First, compute the **Gram matrix** G = B · Bᵀ of the private basis:
$$
G = \begin{pmatrix} \langle b_0, b_0 \rangle & \langle b_0, b_1 \rangle \\\ \langle b_1, b_0 \rangle & \langle b_1, b_1 \rangle \end{pmatrix}
$$
where b₀ = (g, −f) and b₁ = (G, −F). Each entry is a polynomial computed via:
```
G₁₁ = G·adj(G) + F·adj(F) (= |b₁|²)
```
`adj(p)` means the **adjoint**: coefficients in reverse order with sign flip
(complex conjugate in FFT domain).
Then decompose G with **LDL factorization**:
$$
G = L \cdot D \cdot L^\dagger
$$
```
┌ ┐ ┌ ┐ ┌ ┐
G = │ 1 0 │ · │ d₀₀ 0 │ · │ 1 l₁₀* │
│ l₁₀ 1 │ │ 0 d₁₁ │ │ 0 1 │
└ ┘ └ ┘ └ ┘
l₁₀ = G₀₁ / G₀₀
d₀₀ = G₀₀
This LDL decomposition is applied **recursively** in a binary tree
(the "ffLDL tree"). At each level, polynomials are split in half using
FFT split, and LDL is reapplied. The leaves contain the **per-coefficient
variances** σᵢ = σ / √(dᵢᵢ).
```mermaid
graph TD
R["Gram Matrix G\n(n×n block)"] --> L["l₁₀ coefficients"]
R --> D0["d₀₀\n(left subtree)"]
R --> D1["d₁₁\n(right subtree)"]
D0 --> D0L["d₀₀₀"]
D0 --> D0R["d₀₀₁"]
D1 --> D1L["d₁₁₀"]
D1 --> D1R["d₁₁₁"]
D0L --> LEAF1["σ/√leaf₁"]
D0R --> LEAF2["σ/√leaf₂"]
D1L --> LEAF3["σ/√leaf₃"]
D1R --> LEAF4["σ/√leaf₄"]
style R fill:#4ecdc4,stroke:#333
style LEAF1 fill:#ffe66d,stroke:#333
style LEAF2 fill:#ffe66d,stroke:#333
style LEAF3 fill:#ffe66d,stroke:#333
style LEAF4 fill:#ffe66d,stroke:#333
```
**In code** (`sign.rs: expand_privkey`):
```rust
// Build Gram matrix entries: G₀₀, G₀₁, G₁₁
// Then: ffldl_fft(tree, g00, g01, g11, logn, tmp)
// Normalise leaves: tree leaf = σ / sqrt(leaf_value)
ffldl_binary_normalize(&mut expanded_key[tree_off..], logn, logn);
```
#### 4.4.2 Fast Fourier Sampling Algorithm
The FFS algorithm traverses the LDL tree **bottom-up**, sampling one
integer at each leaf using a **discrete Gaussian sampler**:
```
FUNCTION ffSampling(t₀, t₁, Tree):
if leaf:
z₀ ← SampleZ(t₀, σ_leaf) // sample integer near t₀
z₁ ← SampleZ(t₁, σ_leaf) // sample integer near t₁
return (z₀, z₁)
// Recursive step:
(t₁₀, t₁₁) ← split(t₁) // FFT split
(z₁₀, z₁₁) ← ffSampling(t₁₀, t₁₁, Tree.right) // recurse right
z₁ ← merge(z₁₀, z₁₁) // FFT merge
// Adjust t₀ using the LDL factor l₁₀:
tb₀ = t₀ + (t₁ − z₁) · l₁₀ // propagate error
(t₀₀, t₀₁) ← split(tb₀) // FFT split
(z₀₀, z₀₁) ← ffSampling(t₀₀, t₀₁, Tree.left) // recurse left
z₀ ← merge(z₀₀, z₀₁) // FFT merge
return (z₀, z₁)
```
```mermaid
flowchart TD
START["Input: target (t₀, t₁)\nLDL Tree"] --> SPLIT1["Split t₁ into\nt₁₀, t₁₁"]
SPLIT1 --> REC1["Recurse RIGHT\nffSampling(t₁₀, t₁₁,\nTree.right)"]
REC1 --> MERGE1["Merge → z₁"]
MERGE1 --> ADJ["Adjust target:\ntb₀ = t₀ + (t₁-z₁)·l₁₀"]
ADJ --> SPLIT2["Split tb₀ into\nt₀₀, t₀₁"]
SPLIT2 --> REC2["Recurse LEFT\nffSampling(t₀₀, t₀₁,\nTree.left)"]
REC2 --> MERGE2["Merge → z₀"]
MERGE2 --> OUT["Output: (z₀, z₁)"]
style START fill:#ff6b6b,stroke:#333,color:#fff
style OUT fill:#51cf66,stroke:#333,color:#fff
style ADJ fill:#ffd43b,stroke:#333
```
**Key insight**: At the leaves, the sampler draws an integer z close to a
real number t with probability proportional to:
$$
\Pr[z] \propto \exp\!\left(-\frac{(z - t)^2}{2\sigma^2}\right)
$$
This is a **discrete Gaussian** centered on t with standard deviation σ.
#### 4.4.3 The Discrete Gaussian Sampler
The integer sampler uses **rejection sampling** with a base half-Gaussian:
```
FUNCTION SamplerZ(μ, 1/σ):
1. s ← ⌊μ⌋, r ← μ - s // integer + fractional parts
2. dss ← (1/σ)² / 2
3. REPEAT:
z₀ ← BaseSampler() // half-Gaussian, σ₀ = 1.8205
b ← random bit
z ← b + (2b−1)·z₀ // fold to full Gaussian
// Rejection: accept with probability ratio
x ← (z − r)² · dss − z₀² / (2·σ₀²)
Accept with probability exp(−x) · (σ_min/σ)
4. RETURN s + z
```
The base sampler (`gaussian0_sampler`) uses a precomputed cumulative
distribution table (`GAUSS0_DIST`) with 72-bit precision. The Bernoulli
trial `ber_exp` computes exp(−x) using Crandall's algorithm with 63-bit
fixed-point arithmetic.
### 4.5 Step 4: Recover the Signature
After sampling (z₀, z₁), compute the lattice point:
$$
(s_1, s_2) = (z_0, z_1) \cdot B = z_0 \cdot b_0 + z_1 \cdot b_1
$$
The actual signature components are:
```
s₁ = c − (z₀·g + z₁·G) = hm[i] − round(z₀·b₀₀ + z₁·b₁₀)
s₂ = −(z₀·(−f) + z₁·(−F)) = −round(z₀·b₀₁ + z₁·b₁₁)
```
**In code** (`sign.rs: do_sign_tree`, lines 677-711):
```rust
// z₀·b₀₀ + z₁·b₁₀ → tmp[2n..3n]
fft::poly_mul_fft(&mut tmp[2*n..3*n], &b00[..n], logn);
fft::poly_mul_fft(&mut tmp[3*n..4*n], &b10[..n], logn);
fft::poly_add(...);
// z₀·b₀₁ + z₁·b₁₁ → tmp[n..2n]
fft::poly_mul_fft(&mut tmp[n..2*n], &b11[..n], logn);
fft::poly_add(...);
fft::ifft(&mut tmp[0..n], logn); // back to coefficient domain
fft::ifft(&mut tmp[n..2*n], logn);
// s₁[u] = hm[u] − round(tmp[u])
// s₂[u] = −round(tmp[n+u])
```
### 4.6 Step 5: Shortness Check
The signature is accepted only if the vector (s₁, s₂) is **short enough**:
$$
\|s_1\|^2 + \|s_2\|^2 \leq \beta^2
$$
where β² is a security-level-dependent bound. If the norm is too large
(which happens rarely, ~1% of the time), restart from Step 1 with a fresh
nonce.
The bound values from the standard:
| FN-DSA-512 | 512 | 34,034,726 |
| FN-DSA-1024 | 1024 | 70,265,242 |
**In code** (`common.rs: is_short`): checks `‖s₁‖² + ‖s₂‖² ≤ bound`.
---
## 5. Verification
Verification is **simple and elegant** — it requires only the public key h, with no
knowledge of the secret basis (f, g, F, G).
### 5.1 The Verification Equation
Given **public key h**, **message m**, and **signature (nonce, s₂)**:
```
1. c = HashToPoint(SHAKE256(nonce ‖ m)) ← same hash as signing
2. s₁ = c − s₂ · h mod q ← recover s₁ from public info
3. Accept iff ‖s₁‖² + ‖s₂‖² ≤ β² ← check shortness
```
**Why does this work?** During signing, we chose (s₁, s₂) such that:
```
s₁ + s₂ · h ≡ c (mod q)
```
The verifier reconstructs s₁ = c − s₂·h and checks that both s₁ and s₂
are short. An attacker cannot forge short (s₁, s₂) satisfying this
relation without knowing the secret basis — that would solve the hard
lattice problem (SIS).
### 5.2 Verification in Code
```mermaid
flowchart LR
S2["s₂"] --> NTT1["NTT(s₂)"]
NTT1 --> MUL["s₂ · h\n(pointwise in NTT)"]
H["h (public key\nin NTT form)"] --> MUL
MUL --> INTT["iNTT"]
INTT --> SUB["s₁ = c − s₂·h"]
C["c = Hash(nonce‖m)"] --> SUB
SUB --> CHECK{"‖s₁‖² + ‖s₂‖²\n≤ β² ?"}
CHECK -- ✅ --> VALID["VALID"]
CHECK -- ❌ --> INVALID["INVALID"]
style VALID fill:#51cf66,stroke:#333,color:#fff
style INVALID fill:#ff6b6b,stroke:#333,color:#fff
```
**In code** (`vrfy.rs: verify_raw`):
```rust
// Reduce s₂ mod q, then:
mq_ntt(tt, logn); // NTT(s₂)
mq_poly_montymul_ntt(tt, h, logn); // s₂ · h in NTT domain
mq_intt(tt, logn); // back to coefficients
mq_poly_sub(tt, c0, logn); // tt = s₂·h − c = −s₁
// Normalise to signed range and check shortness
is_short(s1, s2, logn)
```
---
## 6. Security: Why This Is Hard to Break
### 6.1 Underlying Hard Problems
| **SIS** (Short Integer Solution) | Given h, find short s₁, s₂ with s₁ + s₂·h ≡ 0 | Lattice sieving: 2^149 (n=512) |
| **NTRU** | Given h = g/f, recover short f, g | Lattice reduction: 2^126 (n=512) |
| **LWE** (Learning With Errors) | Related; Falcon security reduces to SIS on NTRU lattices | — |
### 6.2 Why Gaussian Sampling is Critical
If the signer just rounded (t₀, t₁) to the nearest integers, the signature
would **leak information** about the private key because the rounding errors
would be correlated with the secret basis vectors.
The discrete Gaussian distribution is the **unique** distribution that makes
signatures **statistically independent** of the private key (up to the public
lattice structure). This is why the GPV framework requires Gaussian sampling
— it provides a **zero-knowledge** property.
### 6.3 Side-Channel Defenses
- **Constant-time modular arithmetic**: `mq_add`, `mq_sub`, `mq_montymul`
use branchless wrapping arithmetic to avoid timing leaks
- **Zeroization**: Private keys and PRNG state are zeroized on drop
- **Rejection sampling bound**: The sampler loop is capped at 1000 iterations
to prevent infinite loops from PRNG failure
---
## 7. Concrete Parameters
### FN-DSA-512 (NIST Security Level I, ≈ AES-128)
| Ring dimension n | 512 |
| Modulus q | 12289 |
| σ (signing Gaussian) | 165.736... |
| σ_min | 1.2778... |
| Signature bound β² | 34,034,726 |
| Public key size | 897 bytes |
| Signature size | ~666 bytes |
| Private key size | ~1281 bytes |
### FN-DSA-1024 (NIST Security Level V, ≈ AES-256)
| Ring dimension n | 1024 |
| Modulus q | 12289 |
| σ (signing Gaussian) | 168.388... |
| σ_min | 1.2989... |
| Signature bound β² | 70,265,242 |
| Public key size | 1793 bytes |
| Signature size | ~1280 bytes |
| Private key size | ~2305 bytes |
### Full Pipeline Summary
```mermaid
sequenceDiagram
participant Signer
participant Hash as SHAKE256
participant Sampler as Gaussian Sampler
participant Verifier
Note over Signer: Has private key (f, g, F, G)<br/>and public key h = g/f mod q
Signer->>Hash: nonce ‖ message
Hash-->>Signer: c = HashToPoint (∈ ℤ_q^n)
Signer->>Signer: t = c · B⁻¹ / q (target vector)
Signer->>Sampler: Sample near t using LDL tree
Sampler-->>Signer: (z₀, z₁) lattice point
Signer->>Signer: s₁ = c − (z₀g + z₁G)
Signer->>Signer: s₂ = z₀f + z₁F
alt ‖s₁‖² + ‖s₂‖² > β²
Signer->>Signer: Restart with new nonce
end
Signer->>Verifier: (nonce, s₂)
Verifier->>Hash: nonce ‖ message
Hash-->>Verifier: c
Verifier->>Verifier: s₁ = c − s₂·h mod q
Verifier->>Verifier: Check ‖s₁‖² + ‖s₂‖² ≤ β²
alt Short enough
Note over Verifier: ✅ VALID
else Too long
Note over Verifier: ❌ INVALID
end
```
---
> **Key Takeaway**: Falcon's elegance lies in the separation — the signer uses
> a short secret basis to "jump" near a target point in a lattice, while the
> verifier only needs to check that the jump landed close enough. The discrete
> Gaussian distribution ensures signatures reveal nothing about the secret
> shortcut, and the hardness of finding short lattice vectors without the
> shortcut keeps everything secure against both classical and quantum attackers.