gmcrypto-simd 1.2.0

SIMD backends for gmcrypto-core — AVX2 (x86_64) and NEON (aarch64) packed bitsliced SM4 S-box, quarantined to keep `gmcrypto-core` `unsafe_code = forbid`
Documentation

gm-crypto-rs

Constant-time-designed pure-Rust SM2 / SM3 / SM4 SDK for Chinese national cryptography (GB/T 32905 / 32918 / 32907 / GM/T 0009). Sign / verify, public-key encrypt / decrypt, SM4-CBC, SM4-CTR (single-shot + streaming), length-flexible batched SM4 block encryption, HMAC-SM3, PBKDF2-HMAC-SM3 — all secret-touching paths guarded by an in-CI dudect-bencher detectable-leak regression harness.

Crates.io Documentation License

Personal project notice: not affiliated with, endorsed by, sponsored by, or certified by any upstream cryptography project, payment gateway, standards body, or vendor.

⚠️ Not independently audited. No third-party / external security audit has been performed. Assurance is internal: a multi-model adversarial pre-publish re-audit (see docs/v1.0-reaudit.md), in-CI KAT vectors, maintainer-run gmssl 3.1.1 interop (11/11, gated on GMCRYPTO_GMSSL — not run in CI), an in-CI dudect timing-leak harness, and a 19-target cargo-fuzz suite. This is a solo-maintained, best-effort open-source project with no support SLA. Review the code and use at your own risk. See SECURITY.md for the threat model and disclosure process.

What this is

A small, auditable, pure-Rust SM2 / SM3 / SM4 SDK whose central differentiating commitment is that secret-touching code paths are constant-time-designed and guarded by an in-CI dudect-bencher detectable-leak regression harness: 19 real ct_* targets (12 always-on + 2 cfg-gated under sm4-bitsliced-simd + 3 cfg-gated under sm4-aead + 1 cfg-gated under sm4-xts + 1 cfg-gated under sm2-key-exchange) plus a deliberately-leaky negative_control that proves the harness can detect leaks. Most real targets gate at |tau| < 0.20; ct_sign_k_class and the direct ct_fn_invert / ct_fp_invert invert diagnostics carry target-specific gate policy after the 2026-05-12 recalibration — see SECURITY.md and docs/v0.5-dudect-recalibration.md.

The harness reports timing-leak detection events. It does not prove constant-time. Low |tau| values mean the test could not detect a leak with the budget given, not that no leak exists. Language taken directly from dudect-bencher's own docs.

The harness covers: SM2 sign (split by both private key d and nonce k magnitude, with both retry nonces class-tied), SM2 decrypt (split by recipient d_B), SM4 key schedule + single-block encrypt (split by master key, under default linear-scan and sm4-bitsliced paths), the v0.5 SIMD-packed dispatch (ct_sm4_encrypt_block_bitsliced_simd, cfg-gated), v0.6's batched CBC-decrypt fanout (ct_sm4_cbc_decrypt_fanout, cfg-gated), v0.7's SM4-CTR encrypt (ct_sm4_ctr_encrypt, exercising the public batch path on every cipher matrix entry), v0.8's SM4-GCM + SM4-CCM decrypt (ct_sm4_gcm_decrypt and ct_sm4_ccm_decrypt, cfg-gated on sm4-aead), v0.9's incremental-input buffered SM4-GCM decrypt (ct_sm4_gcm_decrypt_buffered, cfg-gated on sm4-aead), v1.1's full SM2 key-exchange initiator flow (ct_sm2_key_exchange, cfg-gated on sm2-key-exchange — split by static d_A with per-class valid responder transcripts), HMAC-SM3 (split by key), encrypted-PKCS#8 decrypt (split by password bytes — both classes' blobs valid for their class's password so both succeed via identical control flow), plus direct Fn::invert and Fp::invert diagnostics. The ct_sign_k_class target closes v0.1's structural blind spot to nonce-only leaks.

The crypto-bigint 0.6 → 0.7.3 upgrade resolved the v0.1-era ConstMontyForm::invert leak directly: on the v0.2 W0 harness both direct invert diagnostics measured under |tau| ≈ 0.01, two orders of magnitude below the gate. Subsequent GH Actions runner-image drift on 2026-05-12 raised the empirical noise floor on ct_fn_invert / ct_fp_invert — both targets moved to PR-smoke telemetry + a nightly gross-regression sentinel at |tau| ≥ 0.55. See docs/v0.5-dudect-recalibration.md for the data and posture. See SECURITY.md for the full constant-time discipline.

The differentiator vs. existing Rust SM2 crates (notably RustCrypto/sm2, which already aims for constant-time secret-dependent operations in its design) is the in-CI regression gate, not the design intent in isolation.

What this isn't

  • Not a TLS/TLCP implementation.
  • Not SM9, ZUC, post-quantum.
  • Not an HSM/SDF/SKF integration.
  • Not a certified cryptographic module.
  • Not constant-time on CPUs with data-dependent multiply latencies (some older x86, some embedded).
  • Not a comprehensive SM-crypto library yet — see the milestone roadmap.

Stability & SemVer

The line graduated to 1.0 (stable) with the 1.0.0 release; the current release is 1.2.0 (the C FFI for SM2 key exchange — see the v1.2 scope above). crates.io history goes 0.16.0 → 1.0.0 → 1.0.1 → 1.1.0 → 1.2.0, skipping 0.17.0–0.23.0 (those were non-publishing assurance + API-finalization milestones; their changes all shipped together in the first stable 1.0.0). Every post-1.0 release has been additive (SemVer-checked); the only migration ever required is 0.16 → 1.0, a single major bump — no published 0.x consumer ever saw an intermediate break. The public API had been stable in practice since v0.5; the v1.0 readiness audit (v0.21) froze and tooling-guarded it, the v0.22 API-tightening cycle decoupled it from crypto-bigint 0.7, and the v0.23 pre-1.0 re-audit remediation cycle applied the API/ABI-finality + hardening fixes from a multi-model adversarial re-audit (docs/v1.0-reaudit.md) — see docs/v1.0-readiness.md.

From 1.0, SemVer is enforced: breaking changes to the covered surface require a major bump, and cargo-semver-checks runs as the forward breaking-change gate in CI (the three crates always release together at one lockstep version, with intra-workspace deps pinned exactly — =1.2.0). The runtime wire output (SM2 signatures / ciphertexts, SM4 mode bytes) is byte-identical to 0.16.0.

  • What's covered by SemVer: the public Rust API of gmcrypto-core (the surface snapshotted in docs/api-baseline/gmcrypto-core.txt, drift-checked in CI) and the gmcrypto-c C ABI (the committed crates/gmcrypto-c/include/gmcrypto.h, drift-checked in CI).
  • What's NOT covered: anything #[doc(hidden)]sm2::sign_raw_with_id (the dudect harness hook), Sm4Cbc{Encryptor,Decryptor}::take_output (FFI-shim drains), (v0.22) the low-level SM2 curve arithmetic sm2::curve / sm2::scalar_mul / ProjectivePoint::to_affine, and (v0.23) the raw EC point surface sm2::point / ProjectivePoint (the type + module + re-export) + Sm2PublicKey::{from_point, point}, the low-level asn1::{reader, writer, oid} modules, and the in-crate traits::{Hash, Mac, BlockCipher} module (all kept pub only for in-repo dev crates); and the entire gmcrypto-simd crate, which is an internal acceleration backend with no stable Rust API (use gmcrypto-core from Rust, gmcrypto-c from C). These may change or be removed in any release.
  • High-level key path speaks keys, not points (v0.23). Sm2PrivateKey::public_key() returns Sm2PublicKey (not the now-internal ProjectivePoint); Sm2PublicKey::from_sec1_bytes is the on-curve-checked public point constructor. spki::{encode, decode} and sec1::EcPrivateKey.public speak Sm2PublicKey.
  • RNG bound (v0.23). sm2::{sign_with_id, encrypt} name the fallible rand_core::TryCryptoRng bound — a deliberate, documented ecosystem coupling (rand_core is the RNG interop point, the RustCrypto-wide convention; unlike the v0.22 crypto-bigint decoupling, replacing it would hurt interop). An RNG failure collapses to the single Failed, never a panic.
  • Single-shot SM4-GCM encrypt is fallible (v0.23). mode_gcm::{encrypt, encrypt_with_tag_len} return Option<…>, rejecting plaintext past the 2^36 − 32-byte GCM counter ceiling (matching the streaming path and decrypt).
  • Features are additive (default = []; all 8 are opt-in) and the build is no_std + alloc-only with unsafe_code = "forbid" on the core.
  • MSRV is 1.85 (edition 2024); an MSRV bump is treated as a minor, not a patch.
  • crypto-bigint decoupling (v0.22): the always-on (default-features) public API names no crypto-bigint types — the byte-adjacent types (asn1::{encode,decode}_sig, Sm2Ciphertext::{x,y}) take/return [u8; 32], and the curve/scalar arithmetic is #[doc(hidden)] (above). The only place a crypto-bigint 0.7 type appears in the public API is the opt-in crypto-bigint-scalar feature's Sm2PrivateKey::from_scalar(U256) — enabling that feature is an explicit opt-in to the crypto-bigint 0.7 type contract (a crypto-bigint major bump would be breaking for that feature). The recommended always-on path (Sm2PrivateKey::from_bytes_be) avoids it entirely. See docs/v1.0-readiness.md §3.A.

v1.2 scope — C FFI for SM2 key exchange

Completes the core-in-vN / FFI-in-vN+1 cadence for v1.1: the GM/T 0003.3 key exchange is now reachable from C / C++ / Python / Go / Zig through gmcrypto-c (per docs/v1.2-scope.md Q2.1–Q2.10). 9 new symbols, 2 opaque handle types, 1 new const (GMCRYPTO_SM2_KX_CONFIRM_SIZE = 32) — 72 FFI entry points total, always-on per the v0.23 posture (the committed gmcrypto.h == a default build; gmcrypto-core's own sm2-key-exchange feature stays opt-in for Rust callers).

  • Handle shape: the Rust consume-on-transition typestate collapses to two opaque handles. The initiator is born waiting — gmcrypto_sm2_kx_initiator_new samples the ephemeral internally and writes R_A immediately; _confirm verifies S_B, emits K + S_A, and consumes + frees. The responder: _new_respond (takes R_A, emits R_B + S_B; a failed respond spends the handle, a stray second respond errors without disturbing the in-flight state) → _finish (verifies S_A, releases K, consumes + frees). Misuse ordering collapses to the single GMCRYPTO_ERR.
  • RNG: OS (getrandom::SysRng) by default, plus _with_rng variants taking the v0.5 gmcrypto_rng_callback — which lets the test suite drive fixed standard ephemerals through the ABI.
  • Assurance: the GM/T 0003.5 recommended-curve KAT reproduced byte-for-byte through the C ABI (R_A/R_B/S_B/K/S_A all asserted), FFI↔Rust cross-handshakes in both directions, tamper/misuse/null negative tests (c_smoke 65 → 76); fuzz_c_abi grows a KX op (attacker peer wire bytes, asserted spent-handle semantics) + a valid-transcript seed. No new dudect target (thin shim — core's ct_sm2_key_exchange covers the secret-dependent path; the v0.13/v0.16 precedent). Doc-only example crates/gmcrypto-c/examples/sm2_key_exchange.c (full two-party handshake).
  • The caller owns wiping key_out — the library zeroizes its internal copies only.

v1.1 scope — SM2 key exchange (GM/T 0003.3)

Completes the SM2 family: GM/T 0003.2 sign + 0003.4/.5 encrypt shipped long ago; v1.1 adds the missing third — GM/T 0003.3 ≡ GB/T 32918.3-2016 key agreement with key confirmation — behind the opt-in sm2-key-exchange feature (pure-core, no new dependency; the default-features build is byte-identical).

  • API: two role state-machines — Sm2KxInitiatorproduce_ephemeralconfirm(Sm2SharedKey, S_A), and Sm2KxResponderrespondfinishSm2SharedKey. Each step consumes self: an ephemeral cannot be reused, and the key is unreachable before confirmation passes. The agreed key is ZeroizeOnDrop; every failure (off-curve peer R, bad tag, RNG failure, identity U, bad klen/id) collapses to the single Error::Failed.
  • Constant-time posture: ephemeral via the existing fixed-budget masked sampler; t = (d + x̄·r) mod n and the scalar mults branch-free; confirmation tags compared with subtle::ConstantTimeEq only; t, the KDF input, and x_U/y_U wiped after use. New dudect target ct_sm2_key_exchange (10K smoke |tau| ≈ 0.02, gate < 0.20).
  • KAT: byte-identical to the GM/T 0003.5-2012 recommended-curve worked example (K, S_A, S_B, all intermediate points) — note the example uses the default ID 1234567812345678 for both parties; see docs/v1.1-sm2kx-kat-sourcing.md.
  • Assurance: new fuzz target fuzz_sm2_kx (adversarial peer R_B/S_B bytes, no-panic invariant); sm2-key-exchange legs across the clippy/deny/MSRV/wasm32/dudect CI matrices.
  • C FFI shipped in v1.2 (the core-in-vN / FFI-in-vN+1 cadence — see the v1.2 scope section above).

v1.0.1 scope (shipped)

Readiness-cleanup patch — the first post-1.0 publish. v1.0.1 ships the GO-WITH-FOLLOWUP cleanup from a release-readiness synthesis of the prior audits (docs/audits/2026-06-02-release-readiness-synthesis.md): 0 blockers, all non-blocking polish.

  • Functional fix (the one behavior change): the gmcrypto-c C ABI gmcrypto_version() returned a hardcoded "0.4.0" regardless of the built version — it now reports the real CARGO_PKG_VERSION (so a C caller linking 1.0.1 reads "1.0.1"). This is the single reason 1.0.1 is a crates.io release rather than a docs-only update.
  • Doc improvements: raw-block "not a cipher mode" ECB warnings on Sm4Cipher::{encrypt,decrypt}_block and the corresponding block FFI; cbindgen header pointer/length preconditions; FFI notes on the fallible RNG path and the XTS start_sector range; pre-1.0-stability caveats on the digest-traits / cipher-traits trait impls; and SECURITY.md / README.md / deny.toml corrections.
  • CI-health fixes: sm4-xts added to the MSRV / wasm32 / cargo deny passes; the dudect path-allowlist gained gmcrypto-simd/src/**; cargo generate-lockfile runs before cargo deny; a new simd-x86 job (cargo test -p gmcrypto-simd on ubuntu-latest) that immediately caught a real latent bug — the x86-only SIMD test files lacked #![allow(unsafe_code)], so they had never compiled under CI's -D warnings (fixed); and the pull_request paths-ignore was removed from ci.yml so docs-only PRs are no longer permanently blocked by branch-protection required checks.

No API or ABI change; runtime crypto wire output is byte-identical to 1.0.0cargo-semver-checks runs enforced as the patch-non-breaking gate. 6 merged PRs (#87–#92). Consumers move 1.0.0 → 1.0.1 with a plain cargo update.

v0.16 scope (shipped)

C FFI for the SM4-XTS multi-sector helper. v0.16 exposes the v0.15 sm4::mode_xts::{encrypt_sectors, decrypt_sectors} through the gmcrypto-c C ABI (behind the existing forwarding sm4-xts feature): two new symbols gmcrypto_sm4_xts_encrypt_sectors / gmcrypto_sm4_xts_decrypt_sectors that transform a contiguous run of equal-size sectors in place (buf: *mut u8 + buf_len), deriving sector i's tweak as little-endian-128(start_sector + i) — start_sector is a uint64_t LBA. Unlike the single-shot XTS FFI (uniformly out-of-place), these are in-place — mirroring the core's &mut [u8] API so disk callers never double-allocate. Byte-identical to the core helper; single GMCRYPTO_ERR with buf untouched on error; confidentiality only (no auth). The deferred FFI half of v0.15, on the established core-in-vN / FFI-in-vN+1 cadence — every cipher mode is now FFI-complete. No new dependency, no new feature flag, no new gmcrypto-core API, no new dudect target. Design rationale: docs/v0.16-scope.md (Q16.1–Q16.12).

v0.20 scope (infra-assurance, not a crates.io release) — streaming-decryptor differential fuzzing + coverage

Two new differential fuzz targets + cargo fuzz coverage + a codified v1.0 constant-time baseline. fuzz_sm4_cbc_streaming_decrypt and fuzz_sm4_gcm_streaming_decrypt feed the ciphertext to the streaming decryptors (Sm4CbcDecryptor / Sm4GcmDecryptor) in arbitrary chunk boundaries and assert the result is byte-identical to the single-shot mode_{cbc,gcm}::decrypt oracle — a differential invariant (catches the CBC buffer-back-by-one PKCS#7 boundary and the GCM commit-on-verify GHASH accumulator), stronger than v0.14's no-panic property. The nightly fuzz sweep grows to 18 targets (initial sweep: zero crashes, zero divergences) and gains a non-gating cargo fuzz coverage job that renders per-target llvm-cov TOTALS over the committed seed corpus and uploads them (the report is the deliverable, not a coverage-% gate). v0.20 also codifies the settled v1.0 constant-time baseline in SECURITY.md: composite dudect targets stay gated |tau| < 0.20; the two single-inversion micro-diagnostics remain telemetry + a |tau| ≥ 0.55 sentinel (the v0.19 falsification is the evidence), with a narrow revisit door (a class-split-twin without the inversion op, or offline/dedicated hardware — never PR-executing public self-hosted CI). The theme was chosen after a Codex + Grok strategy discussion (one more assurance cycle that feeds v1.0 readiness, over a third dudect cycle or new features). A repository / infra-assurance milestone — only the workspace-excluded fuzz/ crate + fuzz-nightly.yml + docs change (workspace stays 0.16.0; crates.io skips 0.20.0 per the v0.14/v0.17/v0.18/v0.19 precedent). Design + result: docs/v0.20-scope.md (Q20.1–Q20.5). Next: v0.21 = the v1.0 readiness audit, with v0.20's harnesses + coverage as input evidence.

v0.19 scope (infra-assurance, not a crates.io release) — relative gate tested and falsified

Self-calibrating relative dudect gate — TESTED and FALSIFIED → honest fallback. v0.19 set out to re-promote the two direct-invert diagnostics (ct_fn_invert / ct_fp_invert) off the v0.18 telemetry/sentinel posture by adding two fix-vs-fix noise-floor probes (noise_floor_fn_invert / noise_floor_fp_invert — each runs the same Fn/Fp inversion as its suspect but feeds both dudect classes one identical input, so its |tau| is pure measurement noise) and gating each target relatively: median(target) ≤ max(0.20, 4·median(probe)) — a threshold that adapts to the runner's own noise floor.

The 100K calibration on main falsified the matched-sensitivity premise: the probes stay uniformly quiet (~0.005) while the real class-split targets spike intermittently into [0.26–0.32] (ct_fp_invert reached a median of 0.2606 on the sm4-bitsliced-simd leg, ratio 50). The runner noise lives in the two-input class-split difference (z_small vs z_large), not the operation duration a same-input probe can observe — so the probe cannot track it and the relative threshold just pins at the 0.20 the noise already breaks. Per the pre-committed honest-fallback path, the relative gate is demoted to non-blocking telemetry, the two targets revert to telemetry (PR) / gross-regression sentinel @0.55 (nightly), and the probes are kept as telemetry — they are the evidence that the noise is class-split-specific, the input to a v0.20 class-split-aware "noise-twin" reference. A repository / infra-assurance milestone — the only crate change is the dev-only bench harness (published library byte-unchanged; workspace stays 0.16.0; crates.io skips 0.19.0 per the v0.14 / v0.17 / v0.18 precedent). Design + result: docs/v0.19-scope.md (Q19.1–Q19.7) + docs/v0.5-dudect-recalibration.md (v0.19 resolution).

Deferred to v0.20+: a class-split-aware "noise-twin" dudect reference (the v0.19 successor that could finally re-promote the invert diagnostics); round-trip / differential + streaming-decryptor parser fuzzing; RustCrypto aead trait fit (still 0.6.0-rc.10); cargo fuzz coverage; AVX-512 sbox_x64; CCM buffered input; a v1.0 readiness pass.

v0.18 scope (shipped — infra-assurance, not a crates.io release)

dudect-gate hardening. v0.18 pins the dudect CI workflows' drift axes (ubuntu-24.04 OS-label + exact dtolnay/rust-toolchain@1.95.0) and gates on a CI-level multi-run median |tau| (PR 3 runs / nightly 5 runs; the required_low gates + the nightly gross-regression sentinel use the median, negative_control uses the min, and any required target not measured on every run fails). The bench harness timing_leaks.rs is byte-unchanged — the loop and median live entirely in CI. A 100K×5 calibration measured the ct_fn_invert/ct_fp_invert diagnostics back near their ~0.006 baseline, but they were kept on the telemetry / sentinel posture (not re-promoted): the noise that demoted them is runner-image-sensitive and would re-flake a tight gate if it returns — robustness over a tighter gate. A repository / infra-assurance milestone — no crate code change (workspace stays 0.16.0; crates.io skips 0.18.0 per the v0.14 / v0.17 precedent). Design rationale: docs/v0.18-scope.md (Q18.1–Q18.7) + docs/v0.5-dudect-recalibration.md (v0.18 resolution).

Deferred to v0.19+ (per docs/v0.18-scope.md §5/§6): a self-calibrating relative dudect gate (the change that could safely re-promote the invert diagnostics); round-trip / differential + streaming-decryptor parser fuzzing; RustCrypto aead trait fit (still 0.6.0-rc.10); cargo fuzz coverage; AVX-512 sbox_x64; CCM buffered input; a v1.0 readiness pass.

v0.15 scope (shipped)

SM4-XTS multi-sector (disk) helper. v0.15 adds sm4::mode_xts::{encrypt_sectors, decrypt_sectors} (opt-in sm4-xts feature): encrypt/decrypt a contiguous run of equal-size disk sectors in place (&mut [u8] -> Option<()>), deriving sector i's tweak as the little-endian 128-bit encoding of start_sector + i (the standard disk-XTS data-unit convention). It owns the sector-number → tweak encoding the single-shot v0.12 API left to the caller, and is byte-identical to looping that API per sector. Single None failure mode (buf untouched on validation failure); confidentiality only (no authentication). Pure-core: no new dependency, no new feature flag, no new SIMD, no new dudect target. Design rationale: docs/v0.15-scope.md (Q15.1–Q15.12). The C FFI for the sector helper shipped in v0.16 (above), on the established core-in-vN / FFI-in-vN+1 cadence.

crates.io goes 0.13.0 → 0.15.0: 0.14.0 names the unpublished parser-fuzzing assurance cycle (below) and is intentionally never published.

v0.14 — parser fuzzing (assurance; not a crates.io release)

Pre-v1.0 hardening. v0.14 adds a cargo-fuzz (libFuzzer) harness over the entire untrusted-input decode/decrypt surface of gmcrypto-core — 16 targets covering PEM, PKCS#8 (incl. PBES2 decrypt), SPKI, SEC1, the DER reader primitives, SM2 DER + raw ciphertext, SM2 decrypt + signature-verify, and the SM4-CBC/GCM/CCM/XTS decrypts — proving the failure-mode invariant on adversarial bytes: no panic, no unbounded allocation, no hang. A capped nightly job (.github/workflows/fuzz-nightly.yml) runs them on a schedule.

The initial sweep found zero crashes across all 16 targets, so v0.14 makes no code change to the published crates and is not cut as a crates.io release (publishing byte-identical crypto is release noise) — it lands as an assurance/infra change. The fuzz crate lives in a workspace-excluded fuzz/ (nightly-only; never enters the published dependency graph). Design rationale: docs/v0.14-scope.md. Run it yourself: fuzz/README.md.

Deferred to v0.15+ (per docs/v0.14-scope.md §5/§6): the SM4-XTS per-sector helper (shipped in v0.15, above); round-trip / differential parser fuzzing, streaming-decryptor fuzzing, RustCrypto aead trait fit (still 0.6.0-rc.10), pinned dudect runner, cargo fuzz coverage in CI, AVX-512 sbox_x64, a v1.0 readiness pass (now v0.16+).

v0.13 scope (shipped)

C ABI for SM4-XTS. v0.13 exposes the v0.12 sm4::mode_xts core through the gmcrypto-c C ABI (gmcrypto_sm4_xts_encrypt / _decrypt) behind a new forwarding sm4-xts feature — the deferred FFI half of v0.12, on the established core-then-FFI cadence (SM4-GCM/CCM core in v0.8 → FFI in v0.10). Design rationale: docs/v0.13-scope.md.

  • Additive only — no public API breakage, no new dependency. The default build of both crates is byte-unchanged; sm4-xts forwards to the pure-core gmcrypto-core/sm4-xts.
  • Single-shot, mirroring the single-shot SM4-GCM FFI shape minus nonce/AAD/tag: 32-byte key (Key1 ‖ Key2), 16-byte tweak, length-preserving output via the (out, out_capacity, out_actual_len) convention. Byte-identical to gmcrypto_core::sm4::mode_xts. New GMCRYPTO_SM4_XTS_KEY_SIZE header constant; single GMCRYPTO_ERR failure mode. Confidentiality only.
  • Doc-only C example crates/gmcrypto-c/examples/sm4_xts_sector.c; 5 new c_smoke Rust-equivalence tests. No new gmcrypto-core API, no new dudect target (the FFI is a thin shim over the v0.12 core path).

Followed by v0.14 (per docs/v0.13-scope.md §5/§6): parser fuzzing — the recommended pre-v1.0 assurance gate — landed as the v0.14 assurance cycle above. RustCrypto aead trait fit (still 0.6.0-rc.10), pinned/noise-isolated dudect runner, and AVX-512 sbox_x64 remain deferred.

v0.12 scope (shipped)

SM4-XTS — tweakable mode for disk/sector encryption. v0.12 adds sm4::mode_xts behind the new opt-in sm4-xts feature: single-shot, full ciphertext stealing, GB/T 17964-2021 (GM-T OID 1.2.156.10197.1.104.10), byte-identical to OpenSSL 3.x EVP SM4-XTS (xts_standard=GB). Design rationale: docs/v0.12-scope.md; KAT sourcing: docs/v0.12-xts-kat-sourcing.md.

  • Default-features users are unaffected — additive, opt-in, no new dependency (the XTS tweak doubling is a trivial bit-reflected multiply-by-x, not GHASH, so no gmcrypto-simd dep).
  • GB/T 17964, not IEEE 1619 — the two standards differ in the GF(2¹²⁸) tweak-doubling convention (GB is the bit-reflected / GHASH-style one), so they produce different ciphertext for multi-block / non-aligned data. v0.12 targets GB (the SM4 national standard + OpenSSL's default for SM4-XTS).
  • Confidentiality only — no authentication. XTS has no tag; callers needing integrity use an AEAD mode (GCM/CCM). The per-data-unit tweak-uniqueness contract is the caller's responsibility.
  • 32-byte key (Key1 ‖ Key2) + raw 16-byte tweak; lengths [16 B, 16 MiB]; single None failure mode. New dudect target ct_sm4_xts_decrypt. The whole- block bulk rides the Sm4Cipher::encrypt_blocks batch API (picks up the SIMD fanout under sm4-bitsliced-simd).

Deferred to v0.13 (per docs/v0.12-scope.md §5/§6): C FFI for SM4-XTS, RustCrypto aead trait fit, pinned/noise-isolated dudect runner, AVX-512 sbox_x64, CCM incremental input.

v0.11 scope (shipped)

RustCrypto trait-fit modernization. v0.11 migrates the opt-in digest-traits / cipher-traits impls from digest 0.10 / cipher 0.4 to digest 0.11 / cipher 0.5 (the crypto-common 0.2 / hybrid-array generation), in-place. Design rationale: docs/v0.11-scope.md.

  • Default-features users are unaffected — the trait fit is opt-in; generic-array / hybrid-array never enter the default dep graph, and every SM2 / SM3 / SM4 / HMAC / AEAD output is byte-identical (validated against the full KAT suite + gmssl 3.1.1 interop).
  • BREAKING for trait-fit consumers only: code enabling digest-traits / cipher-traits must bump its own digest / cipher deps to 0.11 / 0.5. HMAC construction via the Mac trait moves to digest::KeyInit::new_from_slice (digest 0.11's Mac dropped KeyInit); the cipher block traits renamed BlockEncrypt / BlockDecryptBlockCipherEncrypt / BlockCipherDecrypt.
  • MSRV stays 1.85. The RustCrypto aead 0.6 trait fit remains deferred (still 0.6.0-rc.10); v0.11 lands the crypto-common 0.2 line it will need.

Deferred to v0.12 (per docs/v0.11-scope.md §5/§6): RustCrypto aead trait fit, pinned/noise-isolated dudect runner, AVX-512 sbox_x64, SM4-XTS, CCM incremental input, Argon2-with-SM3.

v0.10 scope (shipped)

Streaming AEAD FFI. v0.10 exposes the v0.9 incremental-input buffered SM4-GCM encryptor/decryptor through the gmcrypto-c C ABI — the item v0.9 deferred (Q9.6) now that the Rust streaming API is proven. Additive behind the existing sm4-aead feature. Design rationale: docs/v0.10-scope.md.

  • 9 streaming AEAD C FFI symbols + 2 opaque handle typesgmcrypto_sm4_gcm_encryptor_t (output-streaming: new / update → ciphertext per chunk / finalize + finalize_with_tag_len → tag / free) and gmcrypto_sm4_gcm_decryptor_t (commit-on-verify: new / update buffers and emits nothing / finalize_verify releases plaintext only after the constant-time tag check / free). _finalize* consume+free the handle; single GMCRYPTO_ERR on every failure (no tag-/length-oracle across the boundary). Mirrors the v0.5 CBC-streaming lifecycle. C example: examples/sm4_gcm_streaming.c.

No public API breakage — purely additive. v0.9.0 callers can cargo update to v0.10.0 without migration. No new gmcrypto-core API; no new dudect target (the FFI is a thin wrapper over the v0.9 ct_sm4_gcm_decrypt_buffered-gated path).

Deferred to v0.11 (per docs/v0.10-scope.md §5/§6): streaming/incremental CCM, RustCrypto aead trait fit (upstream still 0.6.0-rc.10), pinned dudect runner, AVX-512 sbox_x64, SM4-XTS, Argon2-with-SM3.

v0.9 scope (shipped)

AEAD ergonomics. v0.9 extends the v0.8 AEAD core with the three items v0.8 deferred: GCM tag-length parameterization, incremental-input buffered SM4-GCM, and single-shot AEAD C FFI. All additive behind the existing sm4-aead flag. Design rationale: docs/v0.9-scope.md.

  • sm4::GcmTagLen + mode_gcm::encrypt_with_tag_len / decrypt_with_tag_len — W1. Caller-chosen GCM tag length per NIST SP 800-38D §5.2.1.2 ({4, 8, 12, 13, 14, 15, 16} bytes; truncated tag = MSB_t(full_tag)). GcmTagLen::new(usize) -> Option<Self> centralizes the valid-length policy. The fixed-16-byte encrypt / decrypt are unchanged.
  • sm4::Sm4GcmEncryptor / Sm4GcmDecryptor — W2. Incremental- input buffered SM4-GCM (deliberately NOT "streaming"). The encryptor is output-streaming: update(chunk) -> Option<Vec<u8>> emits each chunk's ciphertext (None once the cumulative plaintext would exceed the NIST §5.2.1.1 ceiling 2^36 − 32 bytes); finalize() / finalize_with_tag_len() emit the tag. The decryptor is input-incremental but output-BUFFERED: update(chunk) buffers ciphertext + folds GHASH, and finalize_verify(tag) -> Option<Vec<u8>> releases the plaintext only after the constant-time tag check (commit-on-verify — never leaks pre-verify bytes). AAD is supplied at construction. Driven with any chunking, both reproduce the single-shot path byte-for-byte.
  • 6 single-shot AEAD C FFI entry points — W4. gmcrypto_sm4_gcm_ encrypt / _decrypt / _encrypt_with_tag_len / _decrypt_with_tag_ len + gmcrypto_sm4_ccm_encrypt / _decrypt, behind a new forwarding sm4-aead feature on gmcrypto-c. Every error path returns GMCRYPTO_ERR (single failure code). Streaming AEAD FFI is deferred to v0.10.
  • New dudect target ct_sm4_gcm_decrypt_buffered — W3. Class-split by master key, drives Sm4GcmDecryptor; |tau| < 0.20 (5K-sample smoke |τ| ≈ 0.029). No new CI matrix slot — rides the existing sm4-aead entries.

No public API breakage — purely additive. v0.8.0 callers can cargo update to v0.9.0 without migration; sm4-aead is opt-in.

Deferred to v0.10 (per docs/v0.9-scope.md §5/§6): CCM incremental input, streaming AEAD FFI, RustCrypto aead trait fit (upstream still on 0.6.0-rc), pinned dudect runner, AVX-512 sbox_x64, SM4-XTS, Argon2-with-SM3.

v0.8 scope (shipped)

The AEAD core. v0.8 cashed in the cipher-mode surface that v0.7 opened up: SM4-GCM and SM4-CCM single-shot, plus a constant-time GHASH primitive in gmcrypto-simd.

  • sm4::mode_gcm::encrypt / decrypt — W2. Single-shot SM4-GCM per NIST SP 800-38D / GM/T 0009 / RFC 8998. encrypt(key, nonce, aad, pt) -> (Vec<u8>, [u8; 16]) returns (ciphertext, tag). decrypt(key, nonce, aad, ct, tag) -> Option<Vec<u8>>Some(plaintext) only when the tag verifies (constant-time compare via subtle::ConstantTimeEq). Both 12-byte canonical and arbitrary-length nonce paths supported. Tag length fixed at 128 bits in v0.8 (parameterized in v0.9 via GcmTagLen). Byte-identical to gmssl 3.1.1 sm4 -gcm — bidirectional interop validated.
  • sm4::mode_ccm::encrypt / decrypt — W3. Single-shot SM4-CCM per NIST SP 800-38C / RFC 3610 / GM/T 0009 (OID 1.2.156.10197.1.104.9). encrypt(key, nonce, aad, pt, tag_len) -> Option<Vec<u8>> (output: ciphertext ‖ tag). tag_len ∈ {4, 6, 8, 10, 12, 14, 16} per spec, validated at API entry. nonce.len() ∈ [7, 13]. Pure-Rust CBC-MAC + CTR over the existing Sm4Cipher path — no GHASH. Byte-identical to OpenSSL 3.x EVP SM4-CCM across 8 KAT scenarios (gmssl 3.1.1 doesn't ship sm4 -ccm so the CCM reference oracle comes from OpenSSL; see docs/v0.8-ccm-kat-sourcing.md).
  • gmcrypto_simd::ghash::ghash_mul(h, x) -> [u8; 16] — W1. Constant-time GHASH multiplication over GF(2^128) / (x^128 + x^7 + x^2 + x + 1). Single dispatch entry point:
    • ghash_mul_clmul on x86_64 (PCLMULQDQ + SSE2; runtime cpufeatures detect; Intel Westmere+ / AMD Bulldozer+).
    • ghash_mul_pmull on aarch64 (ARMv8.0 AES extension vmull_p64; runtime cpufeatures detect; Apple Silicon / most modern ARM chips).
    • ghash_mul_software (bit-serial mask-XOR; constant-time over both inputs; available everywhere as fallback).
  • New sm4-aead feature flag — default-off; opt-in. sm4-aead = ["dep:gmcrypto-simd"] activates mode_gcm and mode_ccm. Additive on the default-features build.
  • New dudect targets ct_sm4_gcm_decrypt + ct_sm4_ccm_decrypt — W4. Class-split by master key over a fixed 256-byte plaintext + 16-byte AAD. Both classes' (ct, tag) pairs are valid encrypts under their own keys, so both decrypt paths reach the tag-compare via identical control flow. Same |tau| < 0.20 gate as the rest of the SM4 surface; new CI matrix slot sm4-bitsliced-simd,sm4-aead exercises the most-demanding cipher-stack combination.

No public API breakage — purely additive. v0.7.0 callers can cargo update to v0.8.0 without migration; sm4-aead is opt-in.

Everything v0.4 shipped (wasm32-unknown-unknown build, RustCrypto trait fit behind digest-traits / cipher-traits, bitsliced SM4 S-box behind sm4-bitsliced, gmcrypto-c C ABI crate) is unchanged — see the Roadmap row for the compact reference and CHANGELOG.md [0.4.0] for detail.

Everything v0.3 shipped is unchanged:

  • Reusable strict-canonical DER reader / writer subset (gmcrypto_core::asn1::{reader, writer, oid}).
  • PEM + encrypted PKCS#8 + X.509 SPKI + SEC1 codecs (gmcrypto_core::{pem, pkcs8, spki, sec1}).
  • Full bidirectional gmssl 3.1.1 interop (SM2 sign / verify, SM2 encrypt / decrypt, SM4-CBC). Gated on GMCRYPTO_GMSSL=1.
  • Raw byte-concat SM2 ciphertext helpers (gmcrypto_core::sm2::raw_ciphertext): C1 || C3 || C2 emit + decode; legacy C1 || C2 || C3 decrypt-only.
  • Streaming HmacSm3 + Sm4Cbc{En,De}cryptor. In-crate Hash / Mac / BlockCipher traits (gmcrypto_core::traits).
  • Comb-table mul_g (~5× sign-side speedup). 64 sub-tables of 16 entries each, lazily built once per process via spin::Once.

Everything v0.2 shipped is unchanged:

  • SM3 hash function (#![no_std] + alloc).
  • SM2 sign / verify with custom signer ID (default 1234567812345678 per GM/T 0009).
  • SM2 public-key encrypt / decrypt with GM/T 0009-2012 ciphertext DER (SEQUENCE { x, y, hash, ciphertext }). Invalid-curve attack defense via on-curve check on C1 before scalar mult; non-branching KDF-zero detection so a chosen-ciphertext attacker cannot distinguish it from a normal MAC failure.
  • SM4 block cipher (GB/T 32907-2016) and SM4-CBC (PKCS#7 padding, caller-supplied unpredictable IV per NIST SP 800-38A Appendix C). Constant-time-designed subtle linear-scan S-box (~1-2M blocks/s); opt-in bitsliced (table-less, gate-only) S-box via the sm4-bitsliced feature (v0.4 W3). PKCS#7 strip uses a constant-time scan over the final block; decrypt collapses every failure mode to a single None against padding-oracle attacks.
  • HMAC-SM3 per RFC 2104, gmssl-cross-validated KAT vectors. Hash-first long-key path. v0.3 adds the streaming HmacSm3 shape alongside single-shot hmac_sm3.
  • PBKDF2-HMAC-SM3 per RFC 8018 §5.2. Caller-supplied output buffer (no internal allocation, no iteration-count default).
  • Constant-time-designed Fp and Fn field arithmetic via crypto-bigint = 0.7.3.
  • Renes-Costello-Batina complete addition formulas for the SM2 curve (a=-3 specialized).
  • Fixed-base (v0.3 comb-table) and variable-base scalar multiplication, both constant-time-designed with subtle::ConditionallySelectable linear-scan table lookup.
  • Fixed-K masked-select signing retry: the retry loop runs K=2 iterations unconditionally, regardless of which iteration produced a valid signature. The constant-time contract holds for any RNG that respects CryptoRng; pathological RNGs cannot leak the secret via observable retry count.
  • Strict canonical ASN.1 DER for SEQUENCE { r, s } (signatures), the GM/T 0009 SM2 ciphertext SEQUENCE, and all v0.3 PEM / PKCS#8 / SPKI / SEC1 wire formats. Rejects non-canonical leading-zero padding, sign-bit-set first bytes, empty content, and (for ciphertext coordinates) values ≥ p.
  • KAT vectors from GB/T 32905-2016 (SM3), GB/T 32918.2-2017 / .5-2017 (SM2), GB/T 32907-2016 Appendix A.1 (SM4 single-block + 1M-round), GM/T 0042-2015 (HMAC-SM3), GM/T 0091-2020 (PBKDF2-HMAC-SM3).
  • gmssl CLI cross-validation for HMAC-SM3, PBKDF2-HMAC-SM3, and (new in v0.3) SM2 sign/verify, SM2 encrypt/decrypt, and SM4-CBC in both directions. Gated on GMCRYPTO_GMSSL=1.
  • dudect-bencher harness — 19 real ct_* targets (12 always-on + 2 cfg-gated under sm4-bitsliced-simd + 3 cfg-gated under sm4-aead + 1 cfg-gated under sm4-xts + 1 cfg-gated under sm2-key-exchange) plus a deliberately-leaky negative_control that proves the harness can detect leaks. Matrix-run under features=default, sm4-bitsliced, sm4-bitsliced-simd, and sm4-bitsliced-simd,sm4-aead,sm4-xts,sm2-key-exchange — PR-smoke 10⁴ samples; nightly 10⁵ samples (more samples = tighter empirical confidence at the same threshold). Most real targets gate at |tau| < 0.20; per-target policy in SECURITY.md.
  • Failure-mode invariant: every Result-returning public API uses the workspace-wide gmcrypto_core::Error (single Failed variant, #[non_exhaustive]); per-module aliases sm2::Error, pem::Error, pkcs8::Error all point at the same type. verify_with_id returns bool; DER decode returns Option. Defense against padding-oracle, malleability, and invalid-curve attacks.
  • Zeroization on private keys, SM4 round keys, HMAC K' / K' XOR ipad / K' XOR opad, PBKDF2 intermediates, SM2 KDF buffers, and PKCS#8 inner-key scratch.

Roadmap

Version Scope
v0.2 (shipped) SM4 + SM4-CBC, HMAC-SM3, PBKDF2-HMAC-SM3, SM2 encrypt/decrypt + GM/T 0009 ciphertext DER, dudect harness expansion to 11 targets. See CHANGELOG.md [0.2.0].
v0.3 (shipped) Reusable ASN.1 reader/writer subset; PEM, encrypted PKCS#8, X.509 SPKI, SEC1; full bidirectional gmssl interop (incl. SM2 sign/verify + SM2 encrypt/decrypt with PEM-wrapped keys + SM4-CBC); raw byte-concat ciphertext helpers (C1||C3||C2 modern + legacy C1||C2||C3 decrypt); streaming HmacSm3 / Sm4CbcEncryptor / Sm4CbcDecryptor + in-crate Hash/Mac/BlockCipher traits; comb-table mul_g (~5× sign-side speedup); dudect harness expanded to 12 targets. See CHANGELOG.md [0.3.0].
v0.4 (shipped) wasm32-unknown-unknown build target; RustCrypto-trait fit (digest::Digest / digest::Mac / cipher::BlockEncrypt/BlockDecrypt) behind opt-in digest-traits / cipher-traits feature flags; bitsliced (table-less, gate-only) SM4 S-box behind the opt-in sm4-bitsliced feature; new gmcrypto-c workspace member exposing the SM2/SM3/SM4/HMAC/PBKDF2 surface as a C ABI (cdylib + staticlib + cbindgen-generated header). See CHANGELOG.md [0.4.0].
v0.5.0 (shipped) C-ABI completeness (streaming CBC + raw-byte SM2 ciphertext + caller-supplied RNG callback); sm4-bitsliced-simd feature-flag scaffolding — v0.5.0 ships no SIMD fast path (the feature transparently delegates to the v0.4 single-block bitslice); BREAKING ergonomic cleanup — workspace-wide gmcrypto_core::Error, Sm2PrivateKey::new(U256)from_scalar(U256) (gated behind crypto-bigint-scalar) + always-on from_bytes_be(&[u8; 32]) constructor, std feature removed. See CHANGELOG.md [0.5.0].
v0.5.1 (shipped) W4 phase 2 — new sibling crate gmcrypto-simd carrying an AVX2 8-way packed bitsliced SM4 S-box behind opt-in sm4-bitsliced-simd, with runtime CPU detection (cpufeatures) and silent scalar fallback on non-AVX2 hosts. v0.5.1's tau dispatch fed the AVX2 path with 7 wasted lanes; production throughput matched v0.4 single-block bitslice. Dudect calibration update — ct_fn_invert / ct_fp_invert moved to PR-smoke telemetry + 100K nightly gross-regression sentinel after a GH Actions ubuntu-24.04 runner-image shift on 2026-05-12 raised the empirical noise floor; see docs/v0.5-dudect-recalibration.md. See CHANGELOG.md [0.5.1].
v0.6.0 (shipped) W4 milestone close-out — the throughput-win release. W4 phase 3: NEON 4-way bitsliced SM4 on aarch64 (compile-time baseline) + AVX2 32-byte full-width packed S-box (sbox_x32) + Sm4CbcDecryptor::process_chunk SIMD fanout. Per round of the SM4 decrypt, batched blocks' tau inputs pack into one SIMD register (32 bytes on x86_64 / 8-block batch, 16 bytes on aarch64 / 4-block batch) — 32× fewer SIMD dispatches per 8-block batch than v0.5.1. CBC encryption stays single-block (chain-of-blocks defeats SIMD packing). New dudect target ct_sm4_cbc_decrypt_fanout (Q6.7) gates the fanout path at |tau| < 0.20. Exhaustive lane-position-shifted SIMD tests (8192 + 4096 cases) per Q6.8. No public API changes; no breaking changes — additive only. See CHANGELOG.md [0.6.0] and docs/v0.6-scope.md.
v0.7.0 (shipped) Cipher-mode surface expansion. First version where v0.6's SIMD machinery is callable from user code outside the CBC-decrypt internal path. New: public length-flexible Sm4Cipher::encrypt_blocks / decrypt_blocks (W1; Q7.7); single-shot sm4::mode_ctr::encrypt / decrypt (W2; GM/T 0002-2012 §5.4); streaming sm4::ctr_streaming::Sm4CtrCipher (W3); new dudect target ct_sm4_ctr_encrypt (gates |tau| < 0.20 on every cipher path). Plus the v0.8 AEAD scope doc (docs/v0.7-aead-scope.md, Q8.1–Q8.8 sign-off + v0.9 candidate Q-list). No public API breakage — additive only. See CHANGELOG.md [0.7.0].
v0.8.0 (shipped) AEAD core — SM4-GCM + SM4-CCM. Per docs/v0.7-aead-scope.md Q8.1–Q8.8. New: gmcrypto_simd::ghash::ghash_mul constant-time GHASH primitive (CLMUL on x86_64 / PMULL on aarch64 / software Karatsuba fallback; W1); sm4::mode_gcm::encrypt / decrypt byte-identical to gmssl 3.1.1 sm4 -gcm with bidirectional interop (W2); sm4::mode_ccm::encrypt / decrypt byte-identical to OpenSSL 3.x EVP SM4-CCM across 8 KAT scenarios (W3; gmssl 3.1.1 lacks sm4 -ccm so OpenSSL is the oracle — see docs/v0.8-ccm-kat-sourcing.md); new dudect targets ct_sm4_gcm_decrypt + ct_sm4_ccm_decrypt + new CI matrix slot sm4-bitsliced-simd,sm4-aead (W4). Behind opt-in sm4-aead feature flag (additive; default-off). No public API breakage — additive only. See CHANGELOG.md [0.8.0].
v0.9.0 (shipped) AEAD ergonomics. Per docs/v0.9-scope.md Q9.1–Q9.10. New: sm4::GcmTagLen + mode_gcm::encrypt_with_tag_len / decrypt_with_tag_len (NIST SP 800-38D §5.2.1.2 truncated tags; W1); incremental-input buffered sm4::Sm4GcmEncryptor (output-streaming) / Sm4GcmDecryptor (output-buffered, commit-on-verify) — differential-KAT-equal to single-shot across arbitrary chunking (W2); new dudect target ct_sm4_gcm_decrypt_buffered (W3); 6 single-shot AEAD C FFI symbols (gmcrypto_sm4_gcm_* / gmcrypto_sm4_ccm_*) behind a forwarding sm4-aead feature on gmcrypto-c (W4). Behind the existing sm4-aead flag. No public API breakage — additive only. See CHANGELOG.md [0.9.0].
v0.10.0 (shipped) Streaming AEAD FFI — SM4-GCM. Per docs/v0.10-scope.md Q10.1–Q10.11. New: 9 gmcrypto-c FFI symbols + 2 opaque handle types exposing the v0.9 incremental-input buffered SM4-GCM encryptor (output-streaming) / decryptor (commit-on-verify) to C/C++/Go/Zig/Python — gmcrypto_sm4_gcm_encryptor_{new,update,finalize,finalize_with_tag_len,free} + gmcrypto_sm4_gcm_decryptor_{new,update,finalize_verify,free}, behind the existing sm4-aead feature on gmcrypto-c; _finalize* consume+free, single GMCRYPTO_ERR; C example examples/sm4_gcm_streaming.c. regen-header now implies sm4-aead (cbindgen drops cfg-gated opaque structs otherwise). No new gmcrypto-core API; no new dudect target. No public API breakage — additive only. See CHANGELOG.md [0.10.0].
v0.11.0 (shipped) RustCrypto trait-fit modernization. Per docs/v0.11-scope.md Q11.1–Q11.11. Migrates the opt-in digest-traits / cipher-traits impls from digest 0.10 / cipher 0.4 to digest 0.11 / cipher 0.5 (the crypto-common 0.2 / hybrid-array generation), in-place: cipher block backend reshaped to cipher 0.5's separate BlockCipherEncBackend / BlockCipherDecBackend; HMAC construction via digest::KeyInit::new_from_slice (digest 0.11 Mac dropped KeyInit). BREAKING for trait-fit consumers only (bump your own digest/cipher); default-features users unaffected, output byte-identical (full KAT + gmssl interop). MSRV stays 1.85; no new dudect target. See CHANGELOG.md [0.11.0].
v0.12.0 (shipped) SM4-XTS — tweakable disk/sector mode. Per docs/v0.12-scope.md Q12.1–Q12.13. New: sm4::mode_xts::{encrypt, decrypt} + XTS_KEY_SIZE behind the opt-in sm4-xts feature — GB/T 17964-2021 (GM-T OID 1.2.156.10197.1.104.10), full ciphertext stealing, byte-identical to OpenSSL 3.x EVP SM4-XTS (xts_standard=GB; not IEEE 1619 — they differ in the GF(2¹²⁸) tweak doubling). 32-byte key (Key1 ‖ Key2) + raw 16-byte tweak, lengths [16 B, 16 MiB], single None failure mode, confidentiality-only (no auth). Pure-core (no new dependency); rides the Sm4Cipher::encrypt_blocks batch API + SIMD fanout. New dudect target ct_sm4_xts_decrypt. Also fixes a latent CI bug where the feature-conditional dudect gates never fired. C FFI deferred to v0.13. Additive — no public API breakage. See CHANGELOG.md [0.12.0].
v0.13.0 (shipped) C ABI for SM4-XTS. Per docs/v0.13-scope.md Q13.1–Q13.12. New: gmcrypto_sm4_xts_encrypt / _decrypt + GMCRYPTO_SM4_XTS_KEY_SIZE in gmcrypto-c, behind a forwarding sm4-xts feature — single-shot, mirroring the single-shot SM4-GCM FFI shape minus nonce/AAD/tag (32-byte key, 16-byte tweak, length-preserving (out, out_capacity, out_actual_len) output), byte-identical to gmcrypto_core::sm4::mode_xts, single GMCRYPTO_ERR, confidentiality-only. The deferred FFI half of v0.12 (the v0.8-core → v0.10-FFI cadence). 5 new c_smoke tests + doc-only C example examples/sm4_xts_sector.c; regenerated header (no regen-header change needed — free fns + always-on const). No new gmcrypto-core API, no new dudect target, no new dependency. Additive — no public API breakage. See CHANGELOG.md [0.13.0].
v0.14 (assurance; not published) Parser fuzzing. Per docs/v0.14-scope.md Q14.1–Q14.12. A cargo-fuzz (libFuzzer) harness over the full untrusted-input decode/decrypt surface of gmcrypto-core (16 targets: PEM, PKCS#8 decode/decrypt, SPKI, SEC1, DER reader primitives, SM2 DER + raw ciphertext, SM2 decrypt + verify, SM4-CBC/GCM/CCM/XTS decrypt) proving the failure-mode invariant on adversarial bytes — no panic / no OOM / no hang. Workspace-excluded fuzz/ crate (nightly-only; never in the published dep graph) + a capped nightly CI job (.github/workflows/fuzz-nightly.yml). Initial sweep: zero crashes → no published-crate change, not a crates.io release (assurance/infra only). See fuzz/README.md.
v0.15.0 (shipped) SM4-XTS multi-sector (disk) helper. Per docs/v0.15-scope.md Q15.1–Q15.12. New: sm4::mode_xts::{encrypt_sectors, decrypt_sectors} (opt-in sm4-xts) — encrypt/decrypt a contiguous run of equal-size disk sectors in place (&mut [u8] -> Option<()>), sector i under tweak = little-endian-128(start_sector + i) (the standard disk-XTS data-unit convention; owns the encoding the single-shot v0.12 API left to the caller). Byte-identical to looping the single-shot per sector (transitively OpenSSL xts_standard=GB-pinned); whole-block sectors (no ciphertext stealing); ciphers built once + reused scratch (no per-sector allocation); single None for all validation with buf untouched; confidentiality-only. Pure-core: no new dependency, no new feature flag, no new SIMD, no new dudect target (the existing ct_sm4_xts_decrypt covers it). C FFI deferred to v0.16. crates.io skips 0.14.0 (the unpublished fuzzing cycle). Additive — no public API breakage. See CHANGELOG.md [0.15.0].
v0.16.0 (shipped) C ABI for the SM4-XTS multi-sector helper. Per docs/v0.16-scope.md Q16.1–Q16.12. New: gmcrypto_sm4_xts_encrypt_sectors / _decrypt_sectors in gmcrypto-c, behind the existing forwarding sm4-xts feature — in-place over a contiguous run of equal-size sectors (buf: *mut u8 + buf_len; no out/out_capacity/out_actual_len, mirroring the core's &mut [u8] so disk callers never double-allocate), start_sector: uint64_t, tweak = LE-128(start_sector + i). Byte-identical to gmcrypto_core::sm4::mode_xts::{encrypt,decrypt}_sectors; single GMCRYPTO_ERR with buf untouched on error; confidentiality-only. The deferred FFI half of v0.15 — every cipher mode is now FFI-complete. 11 new c_smoke tests + doc-only C example examples/sm4_xts_multisector.c; regenerated header (no regen-header change — free fns, no new opaque structs). No new gmcrypto-core API, no new dudect target, no new dependency. Additive — no public API breakage. See CHANGELOG.md [0.16.0].
v0.17 (public release; not a crates.io release) Open-sourced the repository. Flipped the GitHub repo private → public on the 0.x line; CI migrated off the self-hosted macOS runner to GitHub-hosted (ci.ymlmacos-14, fuzz-nightly.ymlubuntu-latest). A repository milestone — no crate code changes (workspace stays 0.16.0; crates.io skips 0.17.0 per the v0.14 precedent); v1.0 reserved. Per docs/v0.17-scope.md.
v0.18 (infra-assurance; not a crates.io release) dudect-gate hardening. Per docs/v0.18-scope.md Q18.1–Q18.7. Pinned the dudect CI workflows' drift axes (ubuntu-24.04 OS-label + exact dtolnay/rust-toolchain@1.95.0) and gate on a CI-level multi-run median |tau| (PR 3 runs / nightly 5 runs; required_low + the nightly sentinel on the median, negative_control on the min, completeness gate on < N runs). timing_leaks.rs byte-unchanged — the loop + median live in CI. A 100K×5 calibration showed ct_fn_invert/ct_fp_invert back near baseline (medians 0.006–0.028) but kept on telemetry / sentinel — not re-promoted (the noise is runner-image-sensitive; a tight gate would re-flake if it returns). Also a comma-free rust-cache shared-key. A repository / infra-assurance milestone — no crate code change (workspace stays 0.16.0; crates.io skips 0.18.0 per the v0.14 / v0.17 precedent). See docs/v0.5-dudect-recalibration.md (v0.18 resolution).
v0.19 (infra-assurance; not a crates.io release) Self-calibrating relative dudect gate — TESTED and FALSIFIED → honest fallback. Per docs/v0.19-scope.md Q19.1–Q19.7. Added two fix-vs-fix noise-floor probes (noise_floor_f{n,p}_invert) + a relative gate median(target) ≤ max(0.20, 4·median(probe)) to re-promote ct_fn_invert/ct_fp_invert. The 100K calibration disproved the matched-sensitivity premise: the probes stay quiet (~0.005) while the targets spike to [0.26–0.32] (ct_fp_invert median 0.2606, ratio 50) — the noise is in the two-input class split, not the operation, so a same-input probe can't track it. Reverted to telemetry / sentinel @0.55; probes kept as telemetry (evidence for a v0.21+ class-split-aware "noise-twin"). Only the dev-only bench harness changed (workspace stays 0.16.0; crates.io skips 0.19.0). See docs/v0.5-dudect-recalibration.md (v0.19 resolution).
v0.20 (infra-assurance; not a crates.io release) Streaming-decryptor differential fuzzing + cargo fuzz coverage + codified v1.0 CT baseline. Per docs/v0.20-scope.md Q20.1–Q20.5. Two new differential targets (fuzz_sm4_{cbc,gcm}_streaming_decrypt) assert the streaming decryptors fed in arbitrary chunks equal the single-shot oracle; fuzz sweep → 18 targets (zero crashes, zero divergences); a non-gating cargo fuzz coverage nightly job (llvm-cov TOTALS artifact). Codified the settled v1.0 CT baseline in SECURITY.md (composite targets gated <0.20; the two single-inversion diagnostics on telemetry/sentinel @0.55, narrow revisit door). Theme chosen after a Codex+Grok discussion. Only fuzz/ + fuzz-nightly.yml + docs changed (workspace stays 0.16.0; crates.io skips 0.20.0).
v0.21 (infra-assurance; not a crates.io release) v1.0 readiness audit. Per docs/v0.21-scope.md Q21.1–Q21.9. Froze + tooling-guarded the public API ahead of 1.0: committed cargo-public-api baselines + an enforced drift-check, cargo-semver-checks (informational pre-1.0), a cargo doc -D warnings gate, and a --no-default-features/--all-features matrix (new .github/workflows/api-stability.yml); finalized the #[doc(hidden)] surface (3 core items + the whole gmcrypto-simd internal backend) with "not public / not SemVer" notes + existence tests; froze the docs. Non-publishing (doc-attributes + tests only, no behavior change; workspace stays 0.16.0, crates.io skips 0.21.0). Headline finding: the always-on public API names crypto-bigint 0.7 types — a decision to resolve before 1.0 (docs/v1.0-readiness.md §3.A). Deferred to post-1.0: class-split-aware "noise-twin" dudect reference; round-trip/differential parser fuzzing; aead 0.6 (upstream 0.6.0-rc.10); AVX-512 sbox_x64; CCM buffered input; the dudect-nightly leg-cancellation fix.
v0.22 (infra-assurance; not a crates.io release) API-tightening — decouple crypto-bigint 0.7 from the 1.0 contract. Per docs/v0.22-scope.md Q22.1–Q22.8 (resolves the v0.21 §3.A finding via Option 2). Group A: #[doc(hidden)] (kept pub) the low-level sm2::curve / sm2::scalar_mul / ProjectivePoint::to_affine surface. Group B: reshape asn1::{encode,decode}_sig + Sm2Ciphertext::{x,y} from U256 to [u8; 32], byte-output-identical (KAT + gmssl interop 11/11). Group C: ProjectivePoint stays public + unchanged. The always-on (default-features) public API now names zero crypto-bigint types; only the opt-in crypto-bigint-scalar from_scalar(U256) retains it (documented escape hatch). BREAKING for consumers that named Fn/Fp/encode_sig/Sm2Ciphertext::x; ships with 1.0 (non-publishing — workspace stays 0.16.0, crates.io skips 0.22.0).
v0.23 (infra-assurance; not a crates.io release) Pre-1.0 re-audit remediation. Per docs/v0.23-scope.md Q23.1–Q23.9 + docs/v1.0-reaudit.md. A multi-model adversarial pre-publish re-audit (Codex gpt-5.5 + Grok, source-verified) returned NO-GO as-is — core primitives sound, but 2 API/ABI BLOCKERs + API-finality / zeroize-on-failure / spec-ceiling / doc should-fixes. Remediated: W1 (API) Sm2PrivateKey::public_key() -> Sm2PublicKey, the raw ProjectivePoint surface + asn1::{reader,writer,oid} + traits::* made #[doc(hidden)]; W2 (crypto) single-shot SM4-GCM encrypt made fallible (2^36−32 ceiling), the fallible rand_core::TryCryptoRng bound on SM2 sign/encrypt (no-panic RNG-failure path), a fixed-budget constant-time SM2 nonce sampler, sign-nonce / CCM-tentative-plaintext / Sm3-on-drop zeroization, SM2 KDF wrap guard; W3 (C ABI) the SM4-GCM/CCM/XTS FFI symbols made always-on so gmcrypto.h == the default build. Runtime output byte-identical (gmssl interop 11/11) except the deliberately-changed signatures; the breaking API/ABI changes ship with 1.0 (non-publishing — workspace stays 0.16.0, crates.io skips 0.23.0).
v1.0 API stabilization + crates.io publish (the deliberate cut after the audit + tightening + re-audit: the crypto-bigint-exposure decision is resolved [v0.22] and the pre-publish re-audit findings remediated [v0.23], bump 0.16.0 → 1.0.0 with exact sibling pins, publish gmcrypto-simd → core → c, flip cargo-semver-checks to enforced — see the runbook in docs/v1.0-readiness.md §4).
v1.0.1 (shipped) Readiness-cleanup patch — first post-1.0 publish. Per the release-readiness synthesis docs/audits/2026-06-02-release-readiness-synthesis.md (GO-WITH-FOLLOWUP, 0 blockers). Functional fix: the gmcrypto-c gmcrypto_version() returned a hardcoded "0.4.0" → now the real CARGO_PKG_VERSION (the one behavior change justifying a patch publish). Plus doc improvements (raw-block ECB warnings, cbindgen header preconditions, FFI RNG/XTS notes, trait-stability caveats) + CI-health fixes (sm4-xts in MSRV/wasm/deny; dudect allowlist; generate-lockfile before deny; a new simd-x86 job that caught a latent unsafe_code compile bug; removed pull_request paths-ignore so docs PRs aren't blocked). No API/ABI change; wire output byte-identical to 1.0.0 (enforced cargo-semver-checks). 6 merged PRs (#87–#92). See CHANGELOG.md [1.0.1].
v1.1.0 SM2 key exchange (GM/T 0003.3) with key confirmation. Per docs/v1.1-sm2-key-exchange-design.md + docs/v1.1-scope.md. New sm2::key_exchange module behind the opt-in sm2-key-exchange feature (pure-core, no new dependency): Sm2KxInitiator/Sm2KxResponder role state-machines with typestate-enforced single-use ephemerals and commit-on-confirm key release; byte-identical to the GM/T 0003.5-2012 recommended-curve worked example (K + S_A/S_B); new dudect target ct_sm2_key_exchange + fuzz target fuzz_sm2_kx. C FFI deferred to v1.2. Additive — no public API breakage.
v1.2.0 C FFI for SM2 key exchange. Per docs/v1.2-scope.md Q2.1–Q2.10. 9 new gmcrypto-c symbols + 2 opaque handles (gmcrypto_sm2_kx_{initiator,responder}_t) project the v1.1 typestate into C: initiator born-waiting (_new emits R_A), _confirm/_finish consume + free, failed-respond spends the handle; SysRng defaults + _with_rng variants; always-on per the v0.23 posture (72 FFI entry points). The GM/T 0003.5 recommended-curve KAT reproduced byte-for-byte through the C ABI; FFI↔Rust cross-handshakes both directions; fuzz_c_abi KX op + seed; doc-only sm2_key_exchange.c example. No core API change. Additive — no breakage.

Quick-start

use gmcrypto_core::sm2::{
    sign_with_id, verify_with_id, Sm2PrivateKey, DEFAULT_SIGNER_ID,
};
use getrandom::SysRng;
use hex_literal::hex;

// v0.5 W5 — `from_bytes_be` is the recommended public constructor
// (always-on, doesn't expose `crypto_bigint::U256` to callers).
let d_be: [u8; 32] = hex!(
    "3945208F7B2144B13F36E38AC6D39F95889393692860B51A42FB81EF4DF7C5B8"
);
let key = Sm2PrivateKey::from_bytes_be(&d_be).expect("d in [1, n-2]");
// `public_key()` returns an `Sm2PublicKey` directly (v0.23).
let public = key.public_key();

// SM2 sign/encrypt take a fallible `rand_core::TryCryptoRng` (v0.23), so
// `getrandom::SysRng` is passed directly — no `UnwrapErr` wrapper.
let mut rng = SysRng;
let sig = sign_with_id(&key, DEFAULT_SIGNER_ID, b"hello", &mut rng).unwrap();
assert!(verify_with_id(&public, DEFAULT_SIGNER_ID, b"hello", &sig));

SM2 key exchange (v1.1, opt-in — gmcrypto-core = { version = "1.2", features = ["sm2-key-exchange"] }): an authenticated two-party key agreement with mandatory key confirmation. Each step consumes the state machine, so an ephemeral cannot be reused and neither side sees the key before the peer's confirmation tag verifies:

use gmcrypto_core::sm2::key_exchange::{Sm2KxInitiator, Sm2KxResponder};

// A (initiator) and B (responder) hold each other's static public keys.
let init = Sm2KxInitiator::new(&key_a, &pub_b, b"A-id", b"B-id", 32)?;
let (r_a, init_waiting) = init.produce_ephemeral(&mut rng)?; // R_A -> B

let resp = Sm2KxResponder::new(&key_b, &pub_a, b"A-id", b"B-id", 32)?;
let (r_b, s_b, resp_waiting) = resp.respond(&r_a, &mut rng)?; // (R_B, S_B) -> A

let (k_a, s_a) = init_waiting.confirm(&r_b, &s_b)?; // verifies S_B; S_A -> B
let k_b = resp_waiting.finish(&s_a)?;               // verifies S_A
assert_eq!(k_a.as_bytes(), k_b.as_bytes());         // 32-byte agreed key

(From C, the same handshake is gmcrypto_sm2_kx_* — see the v1.2 scope above and crates/gmcrypto-c/examples/sm2_key_exchange.c.)

Threat model

See SECURITY.md. Briefly: server-side use, dedicated host, operator-trusted, network MITM in scope, side-channel attacks beyond what the dudect harness covers are NOT in scope.

Build & test

cargo test --workspace                                                          # unit + integration
cargo bench --bench timing_leaks --features crypto-bigint-scalar                # local timing harness (~75s)
DUDECT_SAMPLES=10000 cargo bench --bench timing_leaks --features crypto-bigint-scalar  # match CI smoke budget

gmssl interop test (gated; install gmssl v3.1.1 to enable):

GMCRYPTO_GMSSL=1 cargo test --test interop_gmssl

wasm32 support

gmcrypto-core builds on wasm32-unknown-unknown as of v0.4. CI gates both stable and MSRV (1.85) builds on the target.

rustup target add wasm32-unknown-unknown
cargo build -p gmcrypto-core --target wasm32-unknown-unknown --no-default-features

The crate is no_std + alloc only and does NOT pull getrandom's wasm_js backend or wasm-bindgen / js-sys into its default dep graph. Wasm callers wire their own rand_core::Rng impl — typically by enabling getrandom's wasm_js feature in their Cargo.toml:

[dependencies]
gmcrypto-core = "1.0"
rand_core = { version = "0.10", default-features = false }
getrandom = { version = "0.4", default-features = false, features = ["wasm_js"] }
use gmcrypto_core::sm2::{sign_with_id, Sm2PrivateKey, DEFAULT_SIGNER_ID};
use getrandom::SysRng;

let mut rng = SysRng; // wasm_js-backed when targeting wasm32
let sig = sign_with_id(&priv_key, DEFAULT_SIGNER_ID, b"msg", &mut rng).unwrap();

A wasm-bindgen-test-driven test runner (running KAT vectors under Node or a headless browser) is post-v0.4 — v0.4 ships the build-target gate only.

License

Apache-2.0. See LICENSE.

Some reference outputs use the upstream gmssl tool. This project is independent of that project.