gm-crypto-rs
Constant-time-designed pure-Rust SM2 / SM3 / SM4 SDK for Chinese national
cryptography (GB/T 32905 / 32918 / 32907 / GM/T 0009). Sign / verify,
public-key encrypt / decrypt, SM4-CBC, SM4-CTR (single-shot + streaming),
length-flexible batched SM4 block encryption, HMAC-SM3, PBKDF2-HMAC-SM3 —
all secret-touching paths guarded by an in-CI dudect-bencher
detectable-leak regression harness.
Personal project notice: not affiliated with, endorsed by, sponsored by, or certified by any upstream cryptography project, payment gateway, standards body, or vendor.
What this is
A small, auditable, pure-Rust SM2 / SM3 / SM4 SDK whose central
differentiating commitment is that secret-touching code paths are
constant-time-designed and guarded by an in-CI dudect-bencher
detectable-leak regression harness: 14 real ct_* targets (12
always-on + 2 cfg-gated under sm4-bitsliced-simd) plus a
deliberately-leaky negative_control that proves the harness can
detect leaks. Most real targets gate at |tau| < 0.20;
ct_sign_k_class and the direct ct_fn_invert / ct_fp_invert invert
diagnostics carry target-specific gate policy after the 2026-05-12
recalibration — see SECURITY.md and
docs/v0.5-dudect-recalibration.md.
The harness reports timing-leak detection events. It does not prove
constant-time. Low |tau| values mean the test could not detect a leak with
the budget given, not that no leak exists. Language taken directly from
dudect-bencher's own docs.
The harness covers: SM2 sign (split by both private key d and nonce
k magnitude, with both retry nonces class-tied), SM2 decrypt (split
by recipient d_B), SM4 key schedule + single-block encrypt (split by
master key, under default linear-scan and sm4-bitsliced paths), the
v0.5 SIMD-packed dispatch (ct_sm4_encrypt_block_bitsliced_simd,
cfg-gated), v0.6's batched CBC-decrypt fanout
(ct_sm4_cbc_decrypt_fanout, cfg-gated), v0.7's SM4-CTR encrypt
(ct_sm4_ctr_encrypt, exercising the public batch path on every
cipher matrix entry), HMAC-SM3 (split by key), encrypted-PKCS#8
decrypt (split by password bytes — both classes' blobs valid for their
class's password so both succeed via identical control flow), plus
direct Fn::invert and Fp::invert diagnostics. The ct_sign_k_class
target closes v0.1's structural blind spot to nonce-only leaks.
The crypto-bigint 0.6 → 0.7.3 upgrade resolved the v0.1-era
ConstMontyForm::invert leak directly: on the v0.2 W0 harness both
direct invert diagnostics measured under |tau| ≈ 0.01, two orders of
magnitude below the gate. Subsequent GH Actions runner-image drift on
2026-05-12 raised the empirical noise floor on ct_fn_invert /
ct_fp_invert — both targets moved to PR-smoke telemetry + a nightly
gross-regression sentinel at |tau| ≥ 0.55. See
docs/v0.5-dudect-recalibration.md
for the data and posture. See SECURITY.md for the full
constant-time discipline.
The differentiator vs. existing Rust SM2 crates (notably
RustCrypto/sm2, which already aims for constant-time
secret-dependent operations in its design) is the in-CI regression gate, not
the design intent in isolation.
What this isn't
- Not a TLS/TLCP implementation.
- Not SM9, ZUC, post-quantum.
- Not an HSM/SDF/SKF integration.
- Not a certified cryptographic module.
- Not constant-time on CPUs with data-dependent multiply latencies (some older x86, some embedded).
- Not a comprehensive SM-crypto library yet — see the milestone roadmap.
v0.7 scope (shipping)
The cipher-mode surface expansion. v0.7 is the first version where v0.6's SIMD machinery is directly callable from user code outside the CBC-decrypt internal path:
- Public batch API on
Sm4Cipher— W1.Sm4Cipher::encrypt_blocks(&mut [[u8; 16]])anddecrypt_blocks(&mut [[u8; 16]]), length-flexible (any N including empty). Internally chunks intoSIMD_BATCH(8 onx86_64AVX2, 4 onaarch64NEON, 1 elsewhere undersm4-bitsliced-simd; per-block loop without the feature) and routes the aligned middle through the v0.6 W6crypt_batch_x8/crypt_batch_x4helpers. Byte-identical to N calls intoSm4Cipher::encrypt_block— exhaustively verified at lengths0..=33intests/sm4_batch_api.rs. sm4::mode_ctr::encrypt/decrypt— W2. Single-shot SM4-CTR per GM/T 0002-2012 §5.4 / NIST SP 800-38A §6.5. Counter encoded big-endian, per-block keystream isSM4_E(key, counter + i), BE add, wrap at2^128. No padding (output length == input length). Counter contract is unique-per-key (opposite of CBC's unpredictable IV); noOptionreturn — CTR cannot fail on length/parse like CBC-decrypt can. CTR is unauthenticated; pair with HMAC-SM3 encrypt-then-MAC at the call site if integrity is required (or wait for v0.8 AEAD).sm4::ctr_streaming::Sm4CtrCipher— W3. Streaming SM4-CTR. Single struct serves both encrypt and decrypt (CTR is symmetric). State machine: 16-byte leftover-keystream buffer + position cursor in0..=16handles unaligned chunk boundaries; the aligned middle of eachupdate()routes throughSm4Cipher::encrypt_blocksfor SIMD fanout. Re-exported assm4::Sm4CtrCipher.- AEAD scope doc for v0.8 — W4.
docs/v0.7-aead-scope.md— design cycle scope doc for SM4-GCM + SM4-CCM (Q8.1–Q8.8 sign-off list + v0.9 candidate Q-list). No code; pure design. - New dudect target
ct_sm4_ctr_encrypt— class-split by master key over a fixed 256-byte plaintext. Dispatches throughSm4Cipher::encrypt_blocksso the gate covers every cipher path: linear-scan default, gate-onlysm4-bitsliced, and SIMD-packed batches undersm4-bitsliced-simd. Runs under all three feature matrix entries; gate|tau| < 0.20.
No public API breakage — purely additive. v0.6.0 callers can
cargo update to v0.7.0 without migration.
Everything v0.4 shipped (wasm32-unknown-unknown build, RustCrypto
trait fit behind digest-traits / cipher-traits, bitsliced SM4
S-box behind sm4-bitsliced, gmcrypto-c C ABI crate) is unchanged
— see the Roadmap row for the compact reference and CHANGELOG.md
[0.4.0] for detail.
Everything v0.3 shipped is unchanged:
- Reusable strict-canonical DER reader / writer subset
(
gmcrypto_core::asn1::{reader, writer, oid}). - PEM + encrypted PKCS#8 + X.509 SPKI + SEC1 codecs
(
gmcrypto_core::{pem, pkcs8, spki, sec1}). - Full bidirectional gmssl 3.1.1 interop (SM2 sign / verify, SM2
encrypt / decrypt, SM4-CBC). Gated on
GMCRYPTO_GMSSL=1. - Raw byte-concat SM2 ciphertext helpers
(
gmcrypto_core::sm2::raw_ciphertext):C1 || C3 || C2emit + decode; legacyC1 || C2 || C3decrypt-only. - Streaming
HmacSm3+Sm4Cbc{En,De}cryptor. In-crateHash/Mac/BlockCiphertraits (gmcrypto_core::traits). - Comb-table
mul_g(~5× sign-side speedup). 64 sub-tables of 16 entries each, lazily built once per process viaspin::Once.
Everything v0.2 shipped is unchanged:
- SM3 hash function (
#![no_std]+alloc). - SM2 sign / verify with custom signer ID (default
1234567812345678per GM/T 0009). - SM2 public-key encrypt / decrypt with GM/T 0009-2012 ciphertext DER
(
SEQUENCE { x, y, hash, ciphertext }). Invalid-curve attack defense via on-curve check onC1before scalar mult; non-branching KDF-zero detection so a chosen-ciphertext attacker cannot distinguish it from a normal MAC failure. - SM4 block cipher (GB/T 32907-2016) and SM4-CBC (PKCS#7 padding,
caller-supplied unpredictable IV per NIST SP 800-38A Appendix C).
Constant-time-designed
subtlelinear-scan S-box (~1-2M blocks/s); opt-in bitsliced (table-less, gate-only) S-box via thesm4-bitslicedfeature (v0.4 W3). PKCS#7 strip uses a constant-time scan over the final block;decryptcollapses every failure mode to a singleNoneagainst padding-oracle attacks. - HMAC-SM3 per RFC 2104, gmssl-cross-validated KAT vectors. Hash-first
long-key path. v0.3 adds the streaming
HmacSm3shape alongside single-shothmac_sm3. - PBKDF2-HMAC-SM3 per RFC 8018 §5.2. Caller-supplied output buffer (no internal allocation, no iteration-count default).
- Constant-time-designed
FpandFnfield arithmetic viacrypto-bigint = 0.7.3. - Renes-Costello-Batina complete addition formulas for the SM2 curve (a=-3 specialized).
- Fixed-base (v0.3 comb-table) and variable-base scalar multiplication,
both constant-time-designed with
subtle::ConditionallySelectablelinear-scan table lookup. - Fixed-K masked-select signing retry: the retry loop runs
K=2iterations unconditionally, regardless of which iteration produced a valid signature. The constant-time contract holds for any RNG that respectsCryptoRng; pathological RNGs cannot leak the secret via observable retry count. - Strict canonical ASN.1 DER for
SEQUENCE { r, s }(signatures), the GM/T 0009 SM2 ciphertext SEQUENCE, and all v0.3 PEM / PKCS#8 / SPKI / SEC1 wire formats. Rejects non-canonical leading-zero padding, sign-bit-set first bytes, empty content, and (for ciphertext coordinates) values≥ p. - KAT vectors from GB/T 32905-2016 (SM3), GB/T 32918.2-2017 / .5-2017 (SM2), GB/T 32907-2016 Appendix A.1 (SM4 single-block + 1M-round), GM/T 0042-2015 (HMAC-SM3), GM/T 0091-2020 (PBKDF2-HMAC-SM3).
gmsslCLI cross-validation for HMAC-SM3, PBKDF2-HMAC-SM3, and (new in v0.3) SM2 sign/verify, SM2 encrypt/decrypt, and SM4-CBC in both directions. Gated onGMCRYPTO_GMSSL=1.dudect-bencherharness — 14 realct_*targets (12 always-on + 2 cfg-gated undersm4-bitsliced-simd) plus a deliberately-leakynegative_controlthat proves the harness can detect leaks. Matrix-run underfeatures=default,sm4-bitsliced, andsm4-bitsliced-simd— PR-smoke 10⁴ samples; nightly 10⁵ samples (more samples = tighter empirical confidence at the same threshold). Most real targets gate at|tau| < 0.20; per-target policy inSECURITY.md.- Failure-mode invariant: every
Result-returning public API uses the workspace-widegmcrypto_core::Error(singleFailedvariant,#[non_exhaustive]); per-module aliasessm2::Error,pem::Error,pkcs8::Errorall point at the same type.verify_with_idreturnsbool; DER decode returnsOption. Defense against padding-oracle, malleability, and invalid-curve attacks. - Zeroization on private keys, SM4 round keys, HMAC
K'/K' XOR ipad/K' XOR opad, PBKDF2 intermediates, SM2 KDF buffers, and PKCS#8 inner-key scratch.
Roadmap
| Version | Scope |
|---|---|
| v0.2 (shipped) | SM4 + SM4-CBC, HMAC-SM3, PBKDF2-HMAC-SM3, SM2 encrypt/decrypt + GM/T 0009 ciphertext DER, dudect harness expansion to 11 targets. See CHANGELOG.md [0.2.0]. |
| v0.3 (shipped) | Reusable ASN.1 reader/writer subset; PEM, encrypted PKCS#8, X.509 SPKI, SEC1; full bidirectional gmssl interop (incl. SM2 sign/verify + SM2 encrypt/decrypt with PEM-wrapped keys + SM4-CBC); raw byte-concat ciphertext helpers (C1||C3||C2 modern + legacy C1||C2||C3 decrypt); streaming HmacSm3 / Sm4CbcEncryptor / Sm4CbcDecryptor + in-crate Hash/Mac/BlockCipher traits; comb-table mul_g (~5× sign-side speedup); dudect harness expanded to 12 targets. See CHANGELOG.md [0.3.0]. |
| v0.4 (shipped) | wasm32-unknown-unknown build target; RustCrypto-trait fit (digest::Digest / digest::Mac / cipher::BlockEncrypt/BlockDecrypt) behind opt-in digest-traits / cipher-traits feature flags; bitsliced (table-less, gate-only) SM4 S-box behind the opt-in sm4-bitsliced feature; new gmcrypto-c workspace member exposing the SM2/SM3/SM4/HMAC/PBKDF2 surface as a C ABI (cdylib + staticlib + cbindgen-generated header). See CHANGELOG.md [0.4.0]. |
| v0.5.0 (shipped) | C-ABI completeness (streaming CBC + raw-byte SM2 ciphertext + caller-supplied RNG callback); sm4-bitsliced-simd feature-flag scaffolding — v0.5.0 ships no SIMD fast path (the feature transparently delegates to the v0.4 single-block bitslice); BREAKING ergonomic cleanup — workspace-wide gmcrypto_core::Error, Sm2PrivateKey::new(U256) → from_scalar(U256) (gated behind crypto-bigint-scalar) + always-on from_bytes_be(&[u8; 32]) constructor, std feature removed. See CHANGELOG.md [0.5.0]. |
| v0.5.1 (shipped) | W4 phase 2 — new sibling crate gmcrypto-simd carrying an AVX2 8-way packed bitsliced SM4 S-box behind opt-in sm4-bitsliced-simd, with runtime CPU detection (cpufeatures) and silent scalar fallback on non-AVX2 hosts. v0.5.1's tau dispatch fed the AVX2 path with 7 wasted lanes; production throughput matched v0.4 single-block bitslice. Dudect calibration update — ct_fn_invert / ct_fp_invert moved to PR-smoke telemetry + 100K nightly gross-regression sentinel after a GH Actions ubuntu-24.04 runner-image shift on 2026-05-12 raised the empirical noise floor; see docs/v0.5-dudect-recalibration.md. See CHANGELOG.md [0.5.1]. |
| v0.6.0 (shipped) | W4 milestone close-out — the throughput-win release. W4 phase 3: NEON 4-way bitsliced SM4 on aarch64 (compile-time baseline) + AVX2 32-byte full-width packed S-box (sbox_x32) + Sm4CbcDecryptor::process_chunk SIMD fanout. Per round of the SM4 decrypt, batched blocks' tau inputs pack into one SIMD register (32 bytes on x86_64 / 8-block batch, 16 bytes on aarch64 / 4-block batch) — 32× fewer SIMD dispatches per 8-block batch than v0.5.1. CBC encryption stays single-block (chain-of-blocks defeats SIMD packing). New dudect target ct_sm4_cbc_decrypt_fanout (Q6.7) gates the fanout path at |tau| < 0.20. Exhaustive lane-position-shifted SIMD tests (8192 + 4096 cases) per Q6.8. No public API changes; no breaking changes — additive only. See CHANGELOG.md [0.6.0] and docs/v0.6-scope.md. |
| v0.7.0 (shipping) | Cipher-mode surface expansion. First version where v0.6's SIMD machinery is callable from user code outside the CBC-decrypt internal path. New: public length-flexible Sm4Cipher::encrypt_blocks / decrypt_blocks (W1; Q7.7); single-shot sm4::mode_ctr::encrypt / decrypt (W2; GM/T 0002-2012 §5.4); streaming sm4::ctr_streaming::Sm4CtrCipher (W3); new dudect target ct_sm4_ctr_encrypt (gates |tau| < 0.20 on every cipher path). Plus the v0.8 AEAD scope doc (docs/v0.7-aead-scope.md, Q8.1–Q8.8 sign-off + v0.9 candidate Q-list). No public API breakage — additive only. See CHANGELOG.md [0.7.0]. |
| v0.8+ | AEAD per docs/v0.7-aead-scope.md — SM4-GCM + SM4-CCM with constant-time GHASH (CLMUL on x86_64 via gmcrypto-simd, NEON pmull on aarch64, Karatsuba software fallback elsewhere) + constant-time tag compare + bidirectional gmssl interop + two new dudect targets. Behind opt-in sm4-aead feature flag. Other v0.8+ candidates: pinned / noise-isolated dudect runner; AVX-512 16-way sbox_x64; RustCrypto digest = 0.11 / cipher = 0.5 / aead = 0.6 migration; wasm-bindgen-test KAT runner; streaming AEAD. Each lands behind its own scope-doc cycle. |
| v1.0 | API stabilization. |
Quick-start
use ;
use SysRng;
use hex;
use UnwrapErr;
// v0.5 W5 — `from_bytes_be` is the recommended public constructor
// (always-on, doesn't expose `crypto_bigint::U256` to callers).
let d_be: = hex!;
let key = from_bytes_be.expect;
let public = from_point;
let mut rng = UnwrapErr;
let sig = sign_with_id.unwrap;
assert!;
Threat model
See SECURITY.md. Briefly: server-side use, dedicated host,
operator-trusted, network MITM in scope, side-channel attacks beyond what the
dudect harness covers are NOT in scope.
Build & test
DUDECT_SAMPLES=10000
gmssl interop test (gated; install gmssl
v3.1.1 to enable):
GMCRYPTO_GMSSL=1
wasm32 support
gmcrypto-core builds on wasm32-unknown-unknown as of v0.4. CI gates
both stable and MSRV (1.85) builds on the target.
The crate is no_std + alloc only and does NOT pull getrandom's
wasm_js backend or wasm-bindgen / js-sys into its default dep
graph. Wasm callers wire their own rand_core::Rng impl — typically
by enabling getrandom's wasm_js feature in their Cargo.toml:
[]
= "0.7"
= { = "0.10", = false }
= { = "0.4", = false, = ["wasm_js"] }
use ;
use UnwrapErr;
use SysRng;
let mut rng = UnwrapErr; // wasm_js-backed when targeting wasm32
let sig = sign_with_id.unwrap;
A wasm-bindgen-test-driven test runner (running KAT vectors under
Node or a headless browser) is post-v0.4 — v0.4 ships the build-target
gate only.
License
Apache-2.0. See LICENSE.
Some reference outputs use the upstream gmssl
tool. This project is independent of that project.