gmcrypto-core 0.13.0

Constant-time-designed pure-Rust SM2/SM3 primitives (no_std + alloc) with an in-CI dudect timing-leak regression harness
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
# gm-crypto-rs

Constant-time-designed pure-Rust SM2 / SM3 / SM4 SDK for Chinese national
cryptography (GB/T 32905 / 32918 / 32907 / GM/T 0009). Sign / verify,
public-key encrypt / decrypt, SM4-CBC, SM4-CTR (single-shot + streaming),
length-flexible batched SM4 block encryption, HMAC-SM3, PBKDF2-HMAC-SM3 —
all secret-touching paths guarded by an in-CI `dudect-bencher`
detectable-leak regression harness.

[![Crates.io](https://img.shields.io/crates/v/gmcrypto-core.svg)](https://crates.io/crates/gmcrypto-core)
[![Documentation](https://docs.rs/gmcrypto-core/badge.svg)](https://docs.rs/gmcrypto-core)
[![License](https://img.shields.io/crates/l/gmcrypto-core.svg)](https://crates.io/crates/gmcrypto-core)

**Personal project notice:** not affiliated with, endorsed by, sponsored by, or
certified by any upstream cryptography project, payment gateway, standards body,
or vendor.

## What this is

A small, auditable, pure-Rust SM2 / SM3 / SM4 SDK whose central
differentiating commitment is that secret-touching code paths are
**constant-time-designed and guarded by an in-CI [`dudect-bencher`](https://docs.rs/dudect-bencher/)
detectable-leak regression harness**: 18 real `ct_*` targets (12
always-on + 2 cfg-gated under `sm4-bitsliced-simd` + 3 cfg-gated under
`sm4-aead` + 1 cfg-gated under `sm4-xts`) plus a deliberately-leaky
`negative_control` that proves
the harness can detect leaks. Most real targets gate at `|tau| < 0.20`;
`ct_sign_k_class` and the direct `ct_fn_invert` / `ct_fp_invert` invert
diagnostics carry target-specific gate policy after the 2026-05-12
recalibration — see [`SECURITY.md`](SECURITY.md) and
[`docs/v0.5-dudect-recalibration.md`](docs/v0.5-dudect-recalibration.md).

The harness reports timing-leak detection events. **It does not prove
constant-time.** Low `|tau|` values mean the test could not detect a leak with
the budget given, not that no leak exists. Language taken directly from
`dudect-bencher`'s own docs.

The harness covers: SM2 sign (split by both private key `d` and nonce
`k` magnitude, with both retry nonces class-tied), SM2 decrypt (split
by recipient `d_B`), SM4 key schedule + single-block encrypt (split by
master key, under default linear-scan and `sm4-bitsliced` paths), the
v0.5 SIMD-packed dispatch (`ct_sm4_encrypt_block_bitsliced_simd`,
cfg-gated), v0.6's batched CBC-decrypt fanout
(`ct_sm4_cbc_decrypt_fanout`, cfg-gated), v0.7's SM4-CTR encrypt
(`ct_sm4_ctr_encrypt`, exercising the public batch path on every
cipher matrix entry), v0.8's SM4-GCM + SM4-CCM decrypt
(`ct_sm4_gcm_decrypt` and `ct_sm4_ccm_decrypt`, cfg-gated on
`sm4-aead`), v0.9's incremental-input buffered SM4-GCM decrypt
(`ct_sm4_gcm_decrypt_buffered`, cfg-gated on `sm4-aead`), HMAC-SM3
(split by key), encrypted-PKCS#8
decrypt (split by password bytes — both classes' blobs valid for their
class's password so both succeed via identical control flow), plus
direct `Fn::invert` and `Fp::invert` diagnostics. The `ct_sign_k_class`
target closes v0.1's structural blind spot to nonce-only leaks.

The `crypto-bigint 0.6 → 0.7.3` upgrade resolved the v0.1-era
`ConstMontyForm::invert` leak directly: on the v0.2 W0 harness both
direct invert diagnostics measured under `|tau| ≈ 0.01`, two orders of
magnitude below the gate. Subsequent GH Actions runner-image drift on
2026-05-12 raised the empirical noise floor on `ct_fn_invert` /
`ct_fp_invert` — both targets moved to PR-smoke telemetry + a nightly
gross-regression sentinel at `|tau| ≥ 0.55`. See
[`docs/v0.5-dudect-recalibration.md`](docs/v0.5-dudect-recalibration.md)
for the data and posture. See [`SECURITY.md`](SECURITY.md) for the full
constant-time discipline.

The differentiator vs. existing Rust SM2 crates (notably
[`RustCrypto/sm2`](https://docs.rs/sm2/), which already aims for constant-time
secret-dependent operations in its design) is **the in-CI regression gate**, not
the design intent in isolation.

## What this isn't

- Not a TLS/TLCP implementation.
- Not SM9, ZUC, post-quantum.
- Not an HSM/SDF/SKF integration.
- Not a certified cryptographic module.
- Not constant-time on CPUs with data-dependent multiply latencies (some older
  x86, some embedded).
- Not a comprehensive SM-crypto library yet — see the milestone roadmap.

## v0.13 scope (shipping)

**C ABI for SM4-XTS.** v0.13 exposes the v0.12 `sm4::mode_xts` core through the
`gmcrypto-c` C ABI (`gmcrypto_sm4_xts_encrypt` / `_decrypt`) behind a new
forwarding `sm4-xts` feature — the deferred FFI half of v0.12, on the
established core-then-FFI cadence (SM4-GCM/CCM core in v0.8 → FFI in v0.10).
Design rationale: [`docs/v0.13-scope.md`](docs/v0.13-scope.md).

- **Additive only — no public API breakage, no new dependency.** The default
  build of both crates is byte-unchanged; `sm4-xts` forwards to the pure-core
  `gmcrypto-core/sm4-xts`.
- Single-shot, mirroring the single-shot SM4-GCM FFI shape minus nonce/AAD/tag:
  32-byte key (`Key1 ‖ Key2`), 16-byte tweak, length-preserving output via the
  `(out, out_capacity, out_actual_len)` convention. Byte-identical to
  `gmcrypto_core::sm4::mode_xts`. New `GMCRYPTO_SM4_XTS_KEY_SIZE` header
  constant; single `GMCRYPTO_ERR` failure mode. **Confidentiality only.**
- Doc-only C example `crates/gmcrypto-c/examples/sm4_xts_sector.c`; 5 new
  `c_smoke` Rust-equivalence tests. No new `gmcrypto-core` API, no new dudect
  target (the FFI is a thin shim over the v0.12 core path).

**Deferred to v0.14** (per [`docs/v0.13-scope.md`](docs/v0.13-scope.md) §5/§6):
parser fuzzing (`cargo-fuzz` on the untrusted-input decode surface — the
recommended pre-v1.0 assurance gate), RustCrypto `aead` trait fit (still
`0.6.0-rc.10`), pinned/noise-isolated dudect runner, AVX-512 `sbox_x64`.

## v0.12 scope (shipped)

**SM4-XTS — tweakable mode for disk/sector encryption.** v0.12 adds
`sm4::mode_xts` behind the new opt-in `sm4-xts` feature: single-shot, full
ciphertext stealing, GB/T 17964-2021 (GM-T OID `1.2.156.10197.1.104.10`),
byte-identical to OpenSSL 3.x EVP `SM4-XTS` (`xts_standard=GB`). Design
rationale: [`docs/v0.12-scope.md`](docs/v0.12-scope.md); KAT sourcing:
[`docs/v0.12-xts-kat-sourcing.md`](docs/v0.12-xts-kat-sourcing.md).

- **Default-features users are unaffected** — additive, opt-in, **no new
  dependency** (the XTS tweak doubling is a trivial bit-reflected
  multiply-by-x, not GHASH, so no `gmcrypto-simd` dep).
- **GB/T 17964, not IEEE 1619** — the two standards differ in the GF(2¹²⁸)
  tweak-doubling convention (GB is the bit-reflected / GHASH-style one), so
  they produce different ciphertext for multi-block / non-aligned data. v0.12
  targets GB (the SM4 national standard + OpenSSL's default for SM4-XTS).
- **Confidentiality only — no authentication.** XTS has no tag; callers needing
  integrity use an AEAD mode (GCM/CCM). The per-data-unit tweak-uniqueness
  contract is the caller's responsibility.
- 32-byte key (`Key1 ‖ Key2`) + raw 16-byte tweak; lengths `[16 B, 16 MiB]`;
  single `None` failure mode. New dudect target `ct_sm4_xts_decrypt`. The whole-
  block bulk rides the `Sm4Cipher::encrypt_blocks` batch API (picks up the SIMD
  fanout under `sm4-bitsliced-simd`).

**Deferred to v0.13** (per [`docs/v0.12-scope.md`](docs/v0.12-scope.md) §5/§6):
C FFI for SM4-XTS, RustCrypto `aead` trait fit, pinned/noise-isolated dudect
runner, AVX-512 `sbox_x64`, CCM incremental input.

## v0.11 scope (shipped)

**RustCrypto trait-fit modernization.** v0.11 migrates the opt-in
`digest-traits` / `cipher-traits` impls from `digest 0.10` / `cipher 0.4` to
`digest 0.11` / `cipher 0.5` (the `crypto-common 0.2` / `hybrid-array`
generation), in-place. Design rationale:
[`docs/v0.11-scope.md`](docs/v0.11-scope.md).

- **Default-features users are unaffected** — the trait fit is opt-in;
  `generic-array` / `hybrid-array` never enter the default dep graph, and every
  SM2 / SM3 / SM4 / HMAC / AEAD output is byte-identical (validated against the
  full KAT suite + gmssl 3.1.1 interop).
- **BREAKING for trait-fit consumers only:** code enabling
  `digest-traits` / `cipher-traits` must bump its own `digest` / `cipher` deps
  to `0.11` / `0.5`. HMAC construction via the `Mac` trait moves to
  `digest::KeyInit::new_from_slice` (`digest 0.11`'s `Mac` dropped `KeyInit`);
  the `cipher` block traits renamed `BlockEncrypt` / `BlockDecrypt` →
  `BlockCipherEncrypt` / `BlockCipherDecrypt`.
- **MSRV stays 1.85.** The RustCrypto `aead 0.6` trait fit remains deferred
  (still `0.6.0-rc.10`); v0.11 lands the `crypto-common 0.2` line it will need.

**Deferred to v0.12** (per [`docs/v0.11-scope.md`](docs/v0.11-scope.md) §5/§6):
RustCrypto `aead` trait fit, pinned/noise-isolated dudect runner, AVX-512
`sbox_x64`, SM4-XTS, CCM incremental input, Argon2-with-SM3.

## v0.10 scope (shipped)

**Streaming AEAD FFI.** v0.10 exposes the v0.9 incremental-input buffered
SM4-GCM encryptor/decryptor through the `gmcrypto-c` C ABI — the item
v0.9 deferred (Q9.6) now that the Rust streaming API is proven. Additive
behind the existing `sm4-aead` feature. Design rationale:
[`docs/v0.10-scope.md`](docs/v0.10-scope.md).

- **9 streaming AEAD C FFI symbols + 2 opaque handle types** —
  `gmcrypto_sm4_gcm_encryptor_t` (output-streaming: `new` / `update` →
  ciphertext per chunk / `finalize` + `finalize_with_tag_len` → tag /
  `free`) and `gmcrypto_sm4_gcm_decryptor_t` (commit-on-verify: `new` /
  `update` buffers and emits **nothing** / `finalize_verify` releases
  plaintext only after the constant-time tag check / `free`).
  `_finalize*` consume+free the handle; single `GMCRYPTO_ERR` on every
  failure (no tag-/length-oracle across the boundary). Mirrors the v0.5
  CBC-streaming lifecycle. C example:
  [`examples/sm4_gcm_streaming.c`](crates/gmcrypto-c/examples/sm4_gcm_streaming.c).

**No public API breakage — purely additive.** v0.9.0 callers can
`cargo update` to v0.10.0 without migration. No new `gmcrypto-core` API;
no new dudect target (the FFI is a thin wrapper over the v0.9
`ct_sm4_gcm_decrypt_buffered`-gated path).

**Deferred to v0.11** (per [`docs/v0.10-scope.md`](docs/v0.10-scope.md)
§5/§6): streaming/incremental CCM, RustCrypto `aead` trait fit (upstream
still `0.6.0-rc.10`), pinned dudect runner, AVX-512 `sbox_x64`, SM4-XTS,
Argon2-with-SM3.

## v0.9 scope (shipped)

**AEAD ergonomics.** v0.9 extends the v0.8 AEAD core with the three
items v0.8 deferred: GCM tag-length parameterization, incremental-input
buffered SM4-GCM, and single-shot AEAD C FFI. All additive behind the
existing `sm4-aead` flag. Design rationale: [`docs/v0.9-scope.md`](docs/v0.9-scope.md).

- **`sm4::GcmTagLen` + `mode_gcm::encrypt_with_tag_len` /
  `decrypt_with_tag_len`** — W1. Caller-chosen GCM tag length per NIST
  SP 800-38D §5.2.1.2 (`{4, 8, 12, 13, 14, 15, 16}` bytes; truncated
  tag = `MSB_t(full_tag)`). `GcmTagLen::new(usize) -> Option<Self>`
  centralizes the valid-length policy. The fixed-16-byte `encrypt` /
  `decrypt` are unchanged.
- **`sm4::Sm4GcmEncryptor` / `Sm4GcmDecryptor`** — W2. Incremental-
  input buffered SM4-GCM (deliberately NOT "streaming"). The
  **encryptor** is output-streaming: `update(chunk) -> Option<Vec<u8>>`
  emits each chunk's ciphertext (`None` once the cumulative plaintext
  would exceed the NIST §5.2.1.1 ceiling `2^36 − 32` bytes);
  `finalize()` / `finalize_with_tag_len()` emit the tag. The
  **decryptor** is input-incremental but output-BUFFERED:
  `update(chunk)` buffers ciphertext + folds GHASH, and
  `finalize_verify(tag) -> Option<Vec<u8>>` releases the plaintext only
  after the constant-time tag check (commit-on-verify — never leaks
  pre-verify bytes). AAD is supplied at construction. Driven with any
  chunking, both reproduce the single-shot path byte-for-byte.
- **6 single-shot AEAD C FFI entry points** — W4. `gmcrypto_sm4_gcm_
  encrypt` / `_decrypt` / `_encrypt_with_tag_len` / `_decrypt_with_tag_
  len` + `gmcrypto_sm4_ccm_encrypt` / `_decrypt`, behind a new
  forwarding `sm4-aead` feature on `gmcrypto-c`. Every error path
  returns `GMCRYPTO_ERR` (single failure code). Streaming AEAD FFI is
  deferred to v0.10.
- **New dudect target `ct_sm4_gcm_decrypt_buffered`** — W3. Class-split
  by master key, drives `Sm4GcmDecryptor`; `|tau| < 0.20` (5K-sample
  smoke `|τ| ≈ 0.029`). No new CI matrix slot — rides the existing
  `sm4-aead` entries.

**No public API breakage — purely additive.** v0.8.0 callers can
`cargo update` to v0.9.0 without migration; `sm4-aead` is opt-in.

**Deferred to v0.10** (per [`docs/v0.9-scope.md`](docs/v0.9-scope.md)
§5/§6): CCM incremental input, streaming AEAD FFI, RustCrypto `aead`
trait fit (upstream still on `0.6.0-rc`), pinned dudect runner,
AVX-512 `sbox_x64`, SM4-XTS, Argon2-with-SM3.

## v0.8 scope (shipped)

The **AEAD core**. v0.8 cashed in the cipher-mode surface that v0.7
opened up: SM4-GCM and SM4-CCM single-shot, plus a constant-time
GHASH primitive in `gmcrypto-simd`.

- **`sm4::mode_gcm::encrypt` / `decrypt`** — W2. Single-shot SM4-GCM
  per NIST SP 800-38D / GM/T 0009 / RFC 8998. `encrypt(key, nonce,
  aad, pt) -> (Vec<u8>, [u8; 16])` returns `(ciphertext, tag)`.
  `decrypt(key, nonce, aad, ct, tag) -> Option<Vec<u8>>` —
  `Some(plaintext)` only when the tag verifies (constant-time
  compare via `subtle::ConstantTimeEq`). Both 12-byte canonical
  and arbitrary-length nonce paths supported. Tag length fixed at
  128 bits in v0.8 (parameterized in v0.9 via `GcmTagLen`).
  **Byte-identical to gmssl 3.1.1 `sm4 -gcm`** — bidirectional
  interop validated.
- **`sm4::mode_ccm::encrypt` / `decrypt`** — W3. Single-shot SM4-CCM
  per NIST SP 800-38C / RFC 3610 / GM/T 0009 (OID
  `1.2.156.10197.1.104.9`). `encrypt(key, nonce, aad, pt, tag_len)
  -> Option<Vec<u8>>` (output: `ciphertext ‖ tag`). `tag_len ∈
  {4, 6, 8, 10, 12, 14, 16}` per spec, validated at API entry.
  `nonce.len() ∈ [7, 13]`. Pure-Rust CBC-MAC + CTR over the
  existing `Sm4Cipher` path — no GHASH. **Byte-identical to OpenSSL
  3.x EVP `SM4-CCM`** across 8 KAT scenarios (gmssl 3.1.1 doesn't
  ship `sm4 -ccm` so the CCM reference oracle comes from OpenSSL;
  see [`docs/v0.8-ccm-kat-sourcing.md`](docs/v0.8-ccm-kat-sourcing.md)).
- **`gmcrypto_simd::ghash::ghash_mul(h, x) -> [u8; 16]`** — W1.
  Constant-time GHASH multiplication over `GF(2^128) /
  (x^128 + x^7 + x^2 + x + 1)`. Single dispatch entry point:
  - `ghash_mul_clmul` on `x86_64` (PCLMULQDQ + SSE2; runtime
    cpufeatures detect; Intel Westmere+ / AMD Bulldozer+).
  - `ghash_mul_pmull` on `aarch64` (ARMv8.0 AES extension
    `vmull_p64`; runtime cpufeatures detect; Apple Silicon /
    most modern ARM chips).
  - `ghash_mul_software` (bit-serial mask-XOR; constant-time over
    both inputs; available everywhere as fallback).
- **New `sm4-aead` feature flag** — default-off; opt-in.
  `sm4-aead = ["dep:gmcrypto-simd"]` activates `mode_gcm` and
  `mode_ccm`. Additive on the default-features build.
- **New dudect targets `ct_sm4_gcm_decrypt` + `ct_sm4_ccm_decrypt`**
  — W4. Class-split by master key over a fixed 256-byte
  plaintext + 16-byte AAD. Both classes' `(ct, tag)` pairs are
  valid encrypts under their **own** keys, so both decrypt paths
  reach the tag-compare via identical control flow. Same
  `|tau| < 0.20` gate as the rest of the SM4 surface; new CI
  matrix slot `sm4-bitsliced-simd,sm4-aead` exercises the
  most-demanding cipher-stack combination.

**No public API breakage — purely additive.** v0.7.0 callers can
`cargo update` to v0.8.0 without migration; `sm4-aead` is opt-in.

Everything v0.4 shipped (`wasm32-unknown-unknown` build, RustCrypto
trait fit behind `digest-traits` / `cipher-traits`, bitsliced SM4
S-box behind `sm4-bitsliced`, `gmcrypto-c` C ABI crate) is unchanged
— see the Roadmap row for the compact reference and `CHANGELOG.md`
`[0.4.0]` for detail.

Everything v0.3 shipped is unchanged:

- Reusable strict-canonical DER reader / writer subset
  (`gmcrypto_core::asn1::{reader, writer, oid}`).
- PEM + encrypted PKCS#8 + X.509 SPKI + SEC1 codecs
  (`gmcrypto_core::{pem, pkcs8, spki, sec1}`).
- Full bidirectional gmssl 3.1.1 interop (SM2 sign / verify, SM2
  encrypt / decrypt, SM4-CBC). Gated on `GMCRYPTO_GMSSL=1`.
- Raw byte-concat SM2 ciphertext helpers
  (`gmcrypto_core::sm2::raw_ciphertext`): `C1 || C3 || C2`
  emit + decode; legacy `C1 || C2 || C3` decrypt-only.
- Streaming `HmacSm3` + `Sm4Cbc{En,De}cryptor`. In-crate
  `Hash` / `Mac` / `BlockCipher` traits (`gmcrypto_core::traits`).
- Comb-table `mul_g` (~5× sign-side speedup). 64 sub-tables of 16
  entries each, lazily built once per process via `spin::Once`.

Everything v0.2 shipped is unchanged:

- SM3 hash function (`#![no_std]` + `alloc`).
- SM2 sign / verify with custom signer ID (default `1234567812345678` per GM/T 0009).
- SM2 public-key encrypt / decrypt with GM/T 0009-2012 ciphertext DER
  (`SEQUENCE { x, y, hash, ciphertext }`). Invalid-curve attack defense
  via on-curve check on `C1` before scalar mult; non-branching
  KDF-zero detection so a chosen-ciphertext attacker cannot distinguish
  it from a normal MAC failure.
- SM4 block cipher (GB/T 32907-2016) and SM4-CBC (PKCS#7 padding,
  caller-supplied unpredictable IV per NIST SP 800-38A Appendix C).
  Constant-time-designed `subtle` linear-scan S-box (~1-2M blocks/s);
  opt-in bitsliced (table-less, gate-only) S-box via the
  `sm4-bitsliced` feature (v0.4 W3). PKCS#7 strip uses a
  constant-time scan over the final block; `decrypt` collapses every
  failure mode to a single `None` against padding-oracle attacks.
- HMAC-SM3 per RFC 2104, gmssl-cross-validated KAT vectors. Hash-first
  long-key path. v0.3 adds the streaming `HmacSm3` shape alongside
  single-shot `hmac_sm3`.
- PBKDF2-HMAC-SM3 per RFC 8018 §5.2. Caller-supplied output buffer
  (no internal allocation, no iteration-count default).
- Constant-time-designed `Fp` and `Fn` field arithmetic via
  `crypto-bigint = 0.7.3`.
- Renes-Costello-Batina complete addition formulas for the SM2 curve (a=-3 specialized).
- Fixed-base (v0.3 comb-table) and variable-base scalar multiplication,
  both constant-time-designed with `subtle::ConditionallySelectable`
  linear-scan table lookup.
- Fixed-K masked-select signing retry: the retry loop runs `K=2` iterations
  unconditionally, regardless of which iteration produced a valid signature.
  The constant-time contract holds for any RNG that respects `CryptoRng`;
  pathological RNGs cannot leak the secret via observable retry count.
- Strict canonical ASN.1 DER for `SEQUENCE { r, s }` (signatures), the
  GM/T 0009 SM2 ciphertext SEQUENCE, and all v0.3 PEM / PKCS#8 / SPKI
  / SEC1 wire formats. Rejects non-canonical leading-zero padding,
  sign-bit-set first bytes, empty content, and (for ciphertext
  coordinates) values `≥ p`.
- KAT vectors from GB/T 32905-2016 (SM3), GB/T 32918.2-2017 / .5-2017
  (SM2), GB/T 32907-2016 Appendix A.1 (SM4 single-block + 1M-round),
  GM/T 0042-2015 (HMAC-SM3), GM/T 0091-2020 (PBKDF2-HMAC-SM3).
- `gmssl` CLI cross-validation for HMAC-SM3, PBKDF2-HMAC-SM3, and
  (new in v0.3) SM2 sign/verify, SM2 encrypt/decrypt, and SM4-CBC
  in both directions. Gated on `GMCRYPTO_GMSSL=1`.
- `dudect-bencher` harness — 18 real `ct_*` targets (12 always-on + 2
  cfg-gated under `sm4-bitsliced-simd` + 3 cfg-gated under `sm4-aead` + 1
  cfg-gated under `sm4-xts`) plus a deliberately-leaky `negative_control`
  that proves the harness can detect leaks. Matrix-run under
  `features=default`, `sm4-bitsliced`, `sm4-bitsliced-simd`, and
  `sm4-bitsliced-simd,sm4-aead,sm4-xts`
  — PR-smoke 10⁴ samples; nightly 10⁵ samples (more samples = tighter
  empirical confidence at the same threshold). Most real targets gate
  at `|tau| < 0.20`; per-target policy in [`SECURITY.md`](SECURITY.md).
- Failure-mode invariant: every `Result`-returning public API uses
  the workspace-wide `gmcrypto_core::Error` (single `Failed` variant,
  `#[non_exhaustive]`); per-module aliases `sm2::Error`, `pem::Error`,
  `pkcs8::Error` all point at the same type. `verify_with_id` returns
  `bool`; DER decode returns `Option`. Defense against padding-oracle,
  malleability, and invalid-curve attacks.
- Zeroization on private keys, SM4 round keys, HMAC `K'` /
  `K' XOR ipad` / `K' XOR opad`, PBKDF2 intermediates, SM2 KDF
  buffers, and PKCS#8 inner-key scratch.

## Roadmap

| Version | Scope |
|---|---|
| v0.2 (shipped) | SM4 + SM4-CBC, HMAC-SM3, PBKDF2-HMAC-SM3, SM2 encrypt/decrypt + GM/T 0009 ciphertext DER, dudect harness expansion to 11 targets. See [`CHANGELOG.md`](CHANGELOG.md) `[0.2.0]`. |
| v0.3 (shipped) | Reusable ASN.1 reader/writer subset; PEM, encrypted PKCS#8, X.509 SPKI, SEC1; full bidirectional gmssl interop (incl. SM2 sign/verify + SM2 encrypt/decrypt with PEM-wrapped keys + SM4-CBC); raw byte-concat ciphertext helpers (`C1\|\|C3\|\|C2` modern + legacy `C1\|\|C2\|\|C3` decrypt); streaming `HmacSm3` / `Sm4CbcEncryptor` / `Sm4CbcDecryptor` + in-crate `Hash`/`Mac`/`BlockCipher` traits; comb-table `mul_g` (~5× sign-side speedup); dudect harness expanded to 12 targets. See [`CHANGELOG.md`](CHANGELOG.md) `[0.3.0]`. |
| v0.4 (shipped) | `wasm32-unknown-unknown` build target; RustCrypto-trait fit (`digest::Digest` / `digest::Mac` / `cipher::BlockEncrypt`/`BlockDecrypt`) behind opt-in `digest-traits` / `cipher-traits` feature flags; bitsliced (table-less, gate-only) SM4 S-box behind the opt-in `sm4-bitsliced` feature; new `gmcrypto-c` workspace member exposing the SM2/SM3/SM4/HMAC/PBKDF2 surface as a C ABI (cdylib + staticlib + cbindgen-generated header). See [`CHANGELOG.md`](CHANGELOG.md) `[0.4.0]`. |
| v0.5.0 (shipped) | C-ABI completeness (streaming CBC + raw-byte SM2 ciphertext + caller-supplied RNG callback); `sm4-bitsliced-simd` feature-flag scaffolding — v0.5.0 ships no SIMD fast path (the feature transparently delegates to the v0.4 single-block bitslice); BREAKING ergonomic cleanup — workspace-wide `gmcrypto_core::Error`, `Sm2PrivateKey::new(U256)` → `from_scalar(U256)` (gated behind `crypto-bigint-scalar`) + always-on `from_bytes_be(&[u8; 32])` constructor, `std` feature removed. See [`CHANGELOG.md`](CHANGELOG.md) `[0.5.0]`. |
| v0.5.1 (shipped) | W4 phase 2 — new sibling crate `gmcrypto-simd` carrying an **AVX2 8-way packed bitsliced SM4 S-box** behind opt-in `sm4-bitsliced-simd`, with runtime CPU detection (`cpufeatures`) and silent scalar fallback on non-AVX2 hosts. v0.5.1's `tau` dispatch fed the AVX2 path with 7 wasted lanes; production throughput matched v0.4 single-block bitslice. Dudect calibration update — `ct_fn_invert` / `ct_fp_invert` moved to PR-smoke telemetry + 100K nightly gross-regression sentinel after a GH Actions `ubuntu-24.04` runner-image shift on 2026-05-12 raised the empirical noise floor; see `docs/v0.5-dudect-recalibration.md`. See [`CHANGELOG.md`](CHANGELOG.md) `[0.5.1]`. |
| v0.6.0 (shipped) | **W4 milestone close-out — the throughput-win release.** W4 phase 3: NEON 4-way bitsliced SM4 on `aarch64` (compile-time baseline) + AVX2 32-byte full-width packed S-box (`sbox_x32`) + `Sm4CbcDecryptor::process_chunk` SIMD fanout. Per round of the SM4 decrypt, batched blocks' `tau` inputs pack into one SIMD register (32 bytes on x86_64 / 8-block batch, 16 bytes on aarch64 / 4-block batch) — 32× fewer SIMD dispatches per 8-block batch than v0.5.1. CBC encryption stays single-block (chain-of-blocks defeats SIMD packing). New dudect target `ct_sm4_cbc_decrypt_fanout` (Q6.7) gates the fanout path at `\|tau\| < 0.20`. Exhaustive lane-position-shifted SIMD tests (8192 + 4096 cases) per Q6.8. **No public API changes; no breaking changes — additive only.** See [`CHANGELOG.md`](CHANGELOG.md) `[0.6.0]` and `docs/v0.6-scope.md`. |
| v0.7.0 (shipped) | **Cipher-mode surface expansion.** First version where v0.6's SIMD machinery is callable from user code outside the CBC-decrypt internal path. New: public length-flexible `Sm4Cipher::encrypt_blocks` / `decrypt_blocks` (W1; Q7.7); single-shot `sm4::mode_ctr::encrypt` / `decrypt` (W2; GM/T 0002-2012 §5.4); streaming `sm4::ctr_streaming::Sm4CtrCipher` (W3); new dudect target `ct_sm4_ctr_encrypt` (gates `\|tau\| < 0.20` on every cipher path). Plus the v0.8 AEAD scope doc (`docs/v0.7-aead-scope.md`, Q8.1–Q8.8 sign-off + v0.9 candidate Q-list). **No public API breakage — additive only.** See [`CHANGELOG.md`](CHANGELOG.md) `[0.7.0]`. |
| v0.8.0 (shipped) | **AEAD core — SM4-GCM + SM4-CCM.** Per `docs/v0.7-aead-scope.md` Q8.1–Q8.8. New: `gmcrypto_simd::ghash::ghash_mul` constant-time GHASH primitive (CLMUL on `x86_64` / PMULL on `aarch64` / software Karatsuba fallback; W1); `sm4::mode_gcm::encrypt` / `decrypt` byte-identical to gmssl 3.1.1 `sm4 -gcm` with bidirectional interop (W2); `sm4::mode_ccm::encrypt` / `decrypt` byte-identical to OpenSSL 3.x EVP `SM4-CCM` across 8 KAT scenarios (W3; gmssl 3.1.1 lacks `sm4 -ccm` so OpenSSL is the oracle — see `docs/v0.8-ccm-kat-sourcing.md`); new dudect targets `ct_sm4_gcm_decrypt` + `ct_sm4_ccm_decrypt` + new CI matrix slot `sm4-bitsliced-simd,sm4-aead` (W4). Behind opt-in `sm4-aead` feature flag (additive; default-off). **No public API breakage — additive only.** See [`CHANGELOG.md`](CHANGELOG.md) `[0.8.0]`. |
| v0.9.0 (shipped) | **AEAD ergonomics.** Per `docs/v0.9-scope.md` Q9.1–Q9.10. New: `sm4::GcmTagLen` + `mode_gcm::encrypt_with_tag_len` / `decrypt_with_tag_len` (NIST SP 800-38D §5.2.1.2 truncated tags; W1); incremental-input buffered `sm4::Sm4GcmEncryptor` (output-streaming) / `Sm4GcmDecryptor` (output-buffered, commit-on-verify) — differential-KAT-equal to single-shot across arbitrary chunking (W2); new dudect target `ct_sm4_gcm_decrypt_buffered` (W3); 6 single-shot AEAD C FFI symbols (`gmcrypto_sm4_gcm_*` / `gmcrypto_sm4_ccm_*`) behind a forwarding `sm4-aead` feature on `gmcrypto-c` (W4). Behind the existing `sm4-aead` flag. **No public API breakage — additive only.** See [`CHANGELOG.md`](CHANGELOG.md) `[0.9.0]`. |
| v0.10.0 (shipped) | **Streaming AEAD FFI — SM4-GCM.** Per `docs/v0.10-scope.md` Q10.1–Q10.11. New: 9 `gmcrypto-c` FFI symbols + 2 opaque handle types exposing the v0.9 incremental-input buffered SM4-GCM encryptor (output-streaming) / decryptor (commit-on-verify) to C/C++/Go/Zig/Python — `gmcrypto_sm4_gcm_encryptor_{new,update,finalize,finalize_with_tag_len,free}` + `gmcrypto_sm4_gcm_decryptor_{new,update,finalize_verify,free}`, behind the existing `sm4-aead` feature on `gmcrypto-c`; `_finalize*` consume+free, single `GMCRYPTO_ERR`; C example `examples/sm4_gcm_streaming.c`. `regen-header` now implies `sm4-aead` (cbindgen drops cfg-gated opaque structs otherwise). No new `gmcrypto-core` API; no new dudect target. **No public API breakage — additive only.** See [`CHANGELOG.md`](CHANGELOG.md) `[0.10.0]`. |
| v0.11.0 (shipped) | **RustCrypto trait-fit modernization.** Per `docs/v0.11-scope.md` Q11.1–Q11.11. Migrates the opt-in `digest-traits` / `cipher-traits` impls from `digest 0.10` / `cipher 0.4` to `digest 0.11` / `cipher 0.5` (the `crypto-common 0.2` / `hybrid-array` generation), in-place: `cipher` block backend reshaped to cipher 0.5's separate `BlockCipherEncBackend` / `BlockCipherDecBackend`; HMAC construction via `digest::KeyInit::new_from_slice` (`digest 0.11` `Mac` dropped `KeyInit`). **BREAKING for trait-fit consumers only** (bump your own `digest`/`cipher`); default-features users unaffected, output byte-identical (full KAT + gmssl interop). MSRV stays 1.85; no new dudect target. See [`CHANGELOG.md`](CHANGELOG.md) `[0.11.0]`. |
| v0.12.0 (shipped) | **SM4-XTS — tweakable disk/sector mode.** Per `docs/v0.12-scope.md` Q12.1–Q12.13. New: `sm4::mode_xts::{encrypt, decrypt}` + `XTS_KEY_SIZE` behind the opt-in `sm4-xts` feature — GB/T 17964-2021 (GM-T OID `1.2.156.10197.1.104.10`), full ciphertext stealing, byte-identical to OpenSSL 3.x EVP `SM4-XTS` (`xts_standard=GB`; **not** IEEE 1619 — they differ in the GF(2¹²⁸) tweak doubling). 32-byte key (`Key1 ‖ Key2`) + raw 16-byte tweak, lengths `[16 B, 16 MiB]`, single `None` failure mode, confidentiality-only (no auth). Pure-core (**no new dependency**); rides the `Sm4Cipher::encrypt_blocks` batch API + SIMD fanout. New dudect target `ct_sm4_xts_decrypt`. Also fixes a latent CI bug where the feature-conditional dudect gates never fired. C FFI deferred to v0.13. **Additive — no public API breakage.** See [`CHANGELOG.md`](CHANGELOG.md) `[0.12.0]`. |
| v0.13.0 (shipping) | **C ABI for SM4-XTS.** Per `docs/v0.13-scope.md` Q13.1–Q13.12. New: `gmcrypto_sm4_xts_encrypt` / `_decrypt` + `GMCRYPTO_SM4_XTS_KEY_SIZE` in `gmcrypto-c`, behind a forwarding `sm4-xts` feature — single-shot, mirroring the single-shot SM4-GCM FFI shape minus nonce/AAD/tag (32-byte key, 16-byte tweak, length-preserving `(out, out_capacity, out_actual_len)` output), byte-identical to `gmcrypto_core::sm4::mode_xts`, single `GMCRYPTO_ERR`, confidentiality-only. The deferred FFI half of v0.12 (the v0.8-core → v0.10-FFI cadence). 5 new `c_smoke` tests + doc-only C example `examples/sm4_xts_sector.c`; regenerated header (no `regen-header` change needed — free fns + always-on const). No new `gmcrypto-core` API, no new dudect target, **no new dependency**. **Additive — no public API breakage.** See [`CHANGELOG.md`](CHANGELOG.md) `[0.13.0]`. |
| v0.14+ | Per `docs/v0.13-scope.md` §5/§6 (Q14.x): parser fuzzing (`cargo-fuzz` on the DER/PEM/PKCS#8 + AEAD/XTS decode surface — the recommended pre-v1.0 assurance gate); RustCrypto `aead` trait fit (upstream still on `0.6.0-rc.10`); pinned / noise-isolated dudect runner; AVX-512 16-way `sbox_x64`; SM4-XTS streaming / per-sector batch helper (consumer-driven); CCM incremental input; Argon2-with-SM3 (research-only); `wasm-bindgen-test` KAT runner. Each lands behind its own scope-doc cycle. |
| v1.0 | API stabilization. |

## Quick-start

```rust
use gmcrypto_core::sm2::{
    sign_with_id, verify_with_id, Sm2PrivateKey, Sm2PublicKey, DEFAULT_SIGNER_ID,
};
use getrandom::SysRng;
use hex_literal::hex;
use rand_core::UnwrapErr;

// v0.5 W5 — `from_bytes_be` is the recommended public constructor
// (always-on, doesn't expose `crypto_bigint::U256` to callers).
let d_be: [u8; 32] = hex!(
    "3945208F7B2144B13F36E38AC6D39F95889393692860B51A42FB81EF4DF7C5B8"
);
let key = Sm2PrivateKey::from_bytes_be(&d_be).expect("d in [1, n-2]");
let public = Sm2PublicKey::from_point(key.public_key());

let mut rng = UnwrapErr(SysRng);
let sig = sign_with_id(&key, DEFAULT_SIGNER_ID, b"hello", &mut rng).unwrap();
assert!(verify_with_id(&public, DEFAULT_SIGNER_ID, b"hello", &sig));
```

## Threat model

See [`SECURITY.md`](SECURITY.md). Briefly: server-side use, dedicated host,
operator-trusted, network MITM in scope, side-channel attacks beyond what the
dudect harness covers are NOT in scope.

## Build & test

```bash
cargo test --workspace                                                          # unit + integration
cargo bench --bench timing_leaks --features crypto-bigint-scalar                # local timing harness (~75s)
DUDECT_SAMPLES=10000 cargo bench --bench timing_leaks --features crypto-bigint-scalar  # match CI smoke budget
```

`gmssl` interop test (gated; install [`gmssl`](https://github.com/guanzhi/GmSSL)
v3.1.1 to enable):

```bash
GMCRYPTO_GMSSL=1 cargo test --test interop_gmssl
```

## wasm32 support

`gmcrypto-core` builds on `wasm32-unknown-unknown` as of v0.4. CI gates
both stable and MSRV (1.85) builds on the target.

```bash
rustup target add wasm32-unknown-unknown
cargo build -p gmcrypto-core --target wasm32-unknown-unknown --no-default-features
```

The crate is `no_std + alloc` only and does NOT pull `getrandom`'s
`wasm_js` backend or `wasm-bindgen` / `js-sys` into its default dep
graph. Wasm callers wire their own `rand_core::Rng` impl — typically
by enabling `getrandom`'s `wasm_js` feature in *their* `Cargo.toml`:

```toml
[dependencies]
gmcrypto-core = "0.7"
rand_core = { version = "0.10", default-features = false }
getrandom = { version = "0.4", default-features = false, features = ["wasm_js"] }
```

```rust
use gmcrypto_core::sm2::{sign_with_id, Sm2PrivateKey, DEFAULT_SIGNER_ID};
use rand_core::UnwrapErr;
use getrandom::SysRng;

let mut rng = UnwrapErr(SysRng); // wasm_js-backed when targeting wasm32
let sig = sign_with_id(&priv_key, DEFAULT_SIGNER_ID, b"msg", &mut rng).unwrap();
```

A `wasm-bindgen-test`-driven test runner (running KAT vectors under
Node or a headless browser) is post-v0.4 — v0.4 ships the build-target
gate only.

## License

Apache-2.0. See [`LICENSE`](LICENSE).

Some reference outputs use the upstream [`gmssl`](https://github.com/guanzhi/GmSSL)
tool. This project is independent of that project.