1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
//! 16-way packed bitsliced SM4 S-box (v0.6 W6).
//!
//! Public entry point: [`sbox_x16`]. Operates on 16 independent
//! S-box inputs packed as `[u8; 16]`, returning `[u8; 16]`. The
//! intended consumer is `gmcrypto_core::sm4::cbc_streaming::
//! Sm4CbcDecryptor::process_chunk`'s 4-block batched CBC-decrypt
//! fanout on `aarch64`: 4 SM4 blocks × 4 `tau` bytes per round =
//! 16 bytes per call, packed across the full 128-bit `uint8x16_t`
//! NEON register.
//!
//! # Dispatch
//!
//! - On `aarch64`: [`sbox_x16_neon`] — NEON is a compile-time
//! architectural baseline (Q5.12 + Q6.3); no runtime CPU detect.
//! - Elsewhere (any non-aarch64 target): falls back to
//! [`sbox_x16_scalar`] — a 16-iteration loop calling the local
//! single-block [`super::scalar::sbox_byte`].
//!
//! # Constant-time discipline
//!
//! Same as [`super::sbox_x8`]: shared NEON gate sequence (no table
//! lookups, no secret-derived branches); scalar path is the same
//! gate-only `sbox_byte` from [`super::scalar`].
use sbox_byte;
/// Scalar fallback: 16 sequential calls into
/// [`super::scalar::sbox_byte`]. Always available.
/// 16-way packed bitsliced SM4 S-box dispatch.
///
/// On `aarch64`: calls [`sbox_x16_neon`]. Otherwise
/// [`sbox_x16_scalar`].
///
/// Byte-identical output to applying [`super::scalar::sbox_byte`]
/// to each input byte (verified exhaustively in
/// `tests/lane_position_x16.rs` with lane-position-shifted sweeps
/// per Q6.8 / codex's phase 3 flag #4).
/// NEON byte-parallel SM4 S-box on 16 independent inputs.
///
/// # Safety
///
/// Caller must be running on `aarch64` (NEON is baseline; no
/// runtime feature check needed). The public dispatch entry
/// [`sbox_x16`] gates this via `cfg(target_arch = "aarch64")`.
pub unsafe