1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
//! GHASH multiplication in `GF(2^128) / (x^128 + x^7 + x^2 + x + 1)`.
//!
//! NIST SP 800-38D §6.4. The polynomial-multiplication primitive used by
//! SM4-GCM (v0.8 W2) and any other GCM-style AEAD with this reduction
//! polynomial. Hash subkey `H` is secret (derived from the encryption
//! key via the underlying block cipher's encryption of the zero block),
//! so the multiplication must be constant-time over `H`.
//!
//! # Dispatch
//!
//! The public entry point [`ghash_mul`] selects an implementation at
//! runtime based on available CPU features, with silent fallback to
//! the software path:
//!
//! - **x86_64 with PCLMULQDQ + SSE2**: `clmul::ghash_mul_clmul`. Single
//! carryless-multiplication instruction available since Intel Westmere
//! (2010) / AMD Bulldozer (2011). Detected at runtime via
//! [`crate::detect::has_pclmulqdq`].
//! - **aarch64 with PMULL (AES extension)**: `pmull::ghash_mul_pmull`.
//! ARMv8.0 Crypto Extensions; present on all Apple Silicon and most
//! modern aarch64 server / mobile chips. Detected at runtime via
//! [`crate::detect::has_pmull`].
//! - **Otherwise**: [`software::ghash_mul_software`] — constant-time
//! bit-serial. Slower (~5-10× the hardware paths) but correct.
//!
//! Byte-equivalence between the three paths is verified exhaustively by
//! `tests/ghash_lane_equivalence.rs`.
//!
//! # Constant-time discipline
//!
//! All three paths are constant-time over `H`. The software path uses
//! mask-XOR rather than branches; the hardware paths inherit
//! constant-time guarantees from the underlying single-cycle
//! carryless-multiply instructions. No table lookups, no
//! secret-dependent branches, no `_mm_shuffle_*` against secret indices.
pub use ghash_mul_software;
pub use ghash_mul_clmul;
pub use ghash_mul_pmull;
/// GHASH multiplication: `H · X mod (x^128 + x^7 + x^2 + x + 1)`.
///
/// Selects the fastest available implementation at runtime. See the
/// module docstring for dispatch order. Byte-identical output across
/// every dispatch target.