# gm-crypto-rs
Constant-time-designed pure-Rust SM2 / SM3 / SM4 SDK for Chinese national
cryptography (GB/T 32905 / 32918 / 32907 / GM/T 0009). Sign / verify,
public-key encrypt / decrypt, SM4-CBC, SM4-CTR (single-shot + streaming),
length-flexible batched SM4 block encryption, HMAC-SM3, PBKDF2-HMAC-SM3 —
all secret-touching paths guarded by an in-CI `dudect-bencher`
detectable-leak regression harness.
[](https://crates.io/crates/gmcrypto-core)
[](https://docs.rs/gmcrypto-core)
[](https://crates.io/crates/gmcrypto-core)
**Personal project notice:** not affiliated with, endorsed by, sponsored by, or
certified by any upstream cryptography project, payment gateway, standards body,
or vendor.
## What this is
A small, auditable, pure-Rust SM2 / SM3 / SM4 SDK whose central
differentiating commitment is that secret-touching code paths are
**constant-time-designed and guarded by an in-CI [`dudect-bencher`](https://docs.rs/dudect-bencher/)
detectable-leak regression harness**: 14 real `ct_*` targets (12
always-on + 2 cfg-gated under `sm4-bitsliced-simd`) plus a
deliberately-leaky `negative_control` that proves the harness can
detect leaks. Most real targets gate at `|tau| < 0.20`;
`ct_sign_k_class` and the direct `ct_fn_invert` / `ct_fp_invert` invert
diagnostics carry target-specific gate policy after the 2026-05-12
recalibration — see [`SECURITY.md`](SECURITY.md) and
[`docs/v0.5-dudect-recalibration.md`](docs/v0.5-dudect-recalibration.md).
The harness reports timing-leak detection events. **It does not prove
constant-time.** Low `|tau|` values mean the test could not detect a leak with
the budget given, not that no leak exists. Language taken directly from
`dudect-bencher`'s own docs.
The harness covers: SM2 sign (split by both private key `d` and nonce
`k` magnitude, with both retry nonces class-tied), SM2 decrypt (split
by recipient `d_B`), SM4 key schedule + single-block encrypt (split by
master key, under default linear-scan and `sm4-bitsliced` paths), the
v0.5 SIMD-packed dispatch (`ct_sm4_encrypt_block_bitsliced_simd`,
cfg-gated), v0.6's batched CBC-decrypt fanout
(`ct_sm4_cbc_decrypt_fanout`, cfg-gated), v0.7's SM4-CTR encrypt
(`ct_sm4_ctr_encrypt`, exercising the public batch path on every
cipher matrix entry), HMAC-SM3 (split by key), encrypted-PKCS#8
decrypt (split by password bytes — both classes' blobs valid for their
class's password so both succeed via identical control flow), plus
direct `Fn::invert` and `Fp::invert` diagnostics. The `ct_sign_k_class`
target closes v0.1's structural blind spot to nonce-only leaks.
The `crypto-bigint 0.6 → 0.7.3` upgrade resolved the v0.1-era
`ConstMontyForm::invert` leak directly: on the v0.2 W0 harness both
direct invert diagnostics measured under `|tau| ≈ 0.01`, two orders of
magnitude below the gate. Subsequent GH Actions runner-image drift on
2026-05-12 raised the empirical noise floor on `ct_fn_invert` /
`ct_fp_invert` — both targets moved to PR-smoke telemetry + a nightly
gross-regression sentinel at `|tau| ≥ 0.55`. See
[`docs/v0.5-dudect-recalibration.md`](docs/v0.5-dudect-recalibration.md)
for the data and posture. See [`SECURITY.md`](SECURITY.md) for the full
constant-time discipline.
The differentiator vs. existing Rust SM2 crates (notably
[`RustCrypto/sm2`](https://docs.rs/sm2/), which already aims for constant-time
secret-dependent operations in its design) is **the in-CI regression gate**, not
the design intent in isolation.
## What this isn't
- Not a TLS/TLCP implementation.
- Not SM9, ZUC, post-quantum.
- Not an HSM/SDF/SKF integration.
- Not a certified cryptographic module.
- Not constant-time on CPUs with data-dependent multiply latencies (some older
x86, some embedded).
- Not a comprehensive SM-crypto library yet — see the milestone roadmap.
## v0.7 scope (shipping)
The cipher-mode surface expansion. v0.7 is the **first version where
v0.6's SIMD machinery is directly callable from user code outside the
CBC-decrypt internal path**:
- **Public batch API on `Sm4Cipher`** — W1.
`Sm4Cipher::encrypt_blocks(&mut [[u8; 16]])` and
`decrypt_blocks(&mut [[u8; 16]])`, length-flexible (any N including
empty). Internally chunks into `SIMD_BATCH` (8 on `x86_64` AVX2, 4
on `aarch64` NEON, 1 elsewhere under `sm4-bitsliced-simd`; per-block
loop without the feature) and routes the aligned middle through the
v0.6 W6 `crypt_batch_x8` / `crypt_batch_x4` helpers. Byte-identical
to N calls into `Sm4Cipher::encrypt_block` — exhaustively verified
at lengths `0..=33` in `tests/sm4_batch_api.rs`.
- **`sm4::mode_ctr::encrypt` / `decrypt`** — W2. Single-shot SM4-CTR
per GM/T 0002-2012 §5.4 / NIST SP 800-38A §6.5. Counter encoded
big-endian, per-block keystream is `SM4_E(key, counter + i)`,
BE add, wrap at `2^128`. No padding (output length == input
length). Counter contract is **unique-per-key** (opposite of CBC's
unpredictable IV); no `Option` return — CTR cannot fail on
length/parse like CBC-decrypt can. CTR is unauthenticated; pair
with HMAC-SM3 encrypt-then-MAC at the call site if integrity is
required (or wait for v0.8 AEAD).
- **`sm4::ctr_streaming::Sm4CtrCipher`** — W3. Streaming SM4-CTR.
Single struct serves both encrypt and decrypt (CTR is symmetric).
State machine: 16-byte leftover-keystream buffer + position cursor
in `0..=16` handles unaligned chunk boundaries; the aligned middle
of each `update()` routes through `Sm4Cipher::encrypt_blocks` for
SIMD fanout. Re-exported as `sm4::Sm4CtrCipher`.
- **AEAD scope doc for v0.8** — W4.
[`docs/v0.7-aead-scope.md`](docs/v0.7-aead-scope.md) — design cycle
scope doc for SM4-GCM + SM4-CCM (Q8.1–Q8.8 sign-off list + v0.9
candidate Q-list). No code; pure design.
- **New dudect target `ct_sm4_ctr_encrypt`** — class-split by master
key over a fixed 256-byte plaintext. Dispatches through
`Sm4Cipher::encrypt_blocks` so the gate covers every cipher path:
linear-scan default, gate-only `sm4-bitsliced`, and SIMD-packed
batches under `sm4-bitsliced-simd`. Runs under all three feature
matrix entries; gate `|tau| < 0.20`.
**No public API breakage — purely additive.** v0.6.0 callers can
`cargo update` to v0.7.0 without migration.
Everything v0.4 shipped (`wasm32-unknown-unknown` build, RustCrypto
trait fit behind `digest-traits` / `cipher-traits`, bitsliced SM4
S-box behind `sm4-bitsliced`, `gmcrypto-c` C ABI crate) is unchanged
— see the Roadmap row for the compact reference and `CHANGELOG.md`
`[0.4.0]` for detail.
Everything v0.3 shipped is unchanged:
- Reusable strict-canonical DER reader / writer subset
(`gmcrypto_core::asn1::{reader, writer, oid}`).
- PEM + encrypted PKCS#8 + X.509 SPKI + SEC1 codecs
(`gmcrypto_core::{pem, pkcs8, spki, sec1}`).
- Full bidirectional gmssl 3.1.1 interop (SM2 sign / verify, SM2
encrypt / decrypt, SM4-CBC). Gated on `GMCRYPTO_GMSSL=1`.
- Raw byte-concat SM2 ciphertext helpers
(`gmcrypto_core::sm2::raw_ciphertext`): `C1 || C3 || C2`
emit + decode; legacy `C1 || C2 || C3` decrypt-only.
- Streaming `HmacSm3` + `Sm4Cbc{En,De}cryptor`. In-crate
`Hash` / `Mac` / `BlockCipher` traits (`gmcrypto_core::traits`).
- Comb-table `mul_g` (~5× sign-side speedup). 64 sub-tables of 16
entries each, lazily built once per process via `spin::Once`.
Everything v0.2 shipped is unchanged:
- SM3 hash function (`#![no_std]` + `alloc`).
- SM2 sign / verify with custom signer ID (default `1234567812345678` per GM/T 0009).
- SM2 public-key encrypt / decrypt with GM/T 0009-2012 ciphertext DER
(`SEQUENCE { x, y, hash, ciphertext }`). Invalid-curve attack defense
via on-curve check on `C1` before scalar mult; non-branching
KDF-zero detection so a chosen-ciphertext attacker cannot distinguish
it from a normal MAC failure.
- SM4 block cipher (GB/T 32907-2016) and SM4-CBC (PKCS#7 padding,
caller-supplied unpredictable IV per NIST SP 800-38A Appendix C).
Constant-time-designed `subtle` linear-scan S-box (~1-2M blocks/s);
opt-in bitsliced (table-less, gate-only) S-box via the
`sm4-bitsliced` feature (v0.4 W3). PKCS#7 strip uses a
constant-time scan over the final block; `decrypt` collapses every
failure mode to a single `None` against padding-oracle attacks.
- HMAC-SM3 per RFC 2104, gmssl-cross-validated KAT vectors. Hash-first
long-key path. v0.3 adds the streaming `HmacSm3` shape alongside
single-shot `hmac_sm3`.
- PBKDF2-HMAC-SM3 per RFC 8018 §5.2. Caller-supplied output buffer
(no internal allocation, no iteration-count default).
- Constant-time-designed `Fp` and `Fn` field arithmetic via
`crypto-bigint = 0.7.3`.
- Renes-Costello-Batina complete addition formulas for the SM2 curve (a=-3 specialized).
- Fixed-base (v0.3 comb-table) and variable-base scalar multiplication,
both constant-time-designed with `subtle::ConditionallySelectable`
linear-scan table lookup.
- Fixed-K masked-select signing retry: the retry loop runs `K=2` iterations
unconditionally, regardless of which iteration produced a valid signature.
The constant-time contract holds for any RNG that respects `CryptoRng`;
pathological RNGs cannot leak the secret via observable retry count.
- Strict canonical ASN.1 DER for `SEQUENCE { r, s }` (signatures), the
GM/T 0009 SM2 ciphertext SEQUENCE, and all v0.3 PEM / PKCS#8 / SPKI
/ SEC1 wire formats. Rejects non-canonical leading-zero padding,
sign-bit-set first bytes, empty content, and (for ciphertext
coordinates) values `≥ p`.
- KAT vectors from GB/T 32905-2016 (SM3), GB/T 32918.2-2017 / .5-2017
(SM2), GB/T 32907-2016 Appendix A.1 (SM4 single-block + 1M-round),
GM/T 0042-2015 (HMAC-SM3), GM/T 0091-2020 (PBKDF2-HMAC-SM3).
- `gmssl` CLI cross-validation for HMAC-SM3, PBKDF2-HMAC-SM3, and
(new in v0.3) SM2 sign/verify, SM2 encrypt/decrypt, and SM4-CBC
in both directions. Gated on `GMCRYPTO_GMSSL=1`.
- `dudect-bencher` harness — 14 real `ct_*` targets (12 always-on + 2
cfg-gated under `sm4-bitsliced-simd`) plus a deliberately-leaky
`negative_control` that proves the harness can detect leaks.
Matrix-run under `features=default`, `sm4-bitsliced`, and
`sm4-bitsliced-simd` — PR-smoke 10⁴ samples; nightly 10⁵ samples
(more samples = tighter empirical confidence at the same threshold).
Most real targets gate at `|tau| < 0.20`; per-target policy in
[`SECURITY.md`](SECURITY.md).
- Failure-mode invariant: every `Result`-returning public API uses
the workspace-wide `gmcrypto_core::Error` (single `Failed` variant,
`#[non_exhaustive]`); per-module aliases `sm2::Error`, `pem::Error`,
`pkcs8::Error` all point at the same type. `verify_with_id` returns
`bool`; DER decode returns `Option`. Defense against padding-oracle,
malleability, and invalid-curve attacks.
- Zeroization on private keys, SM4 round keys, HMAC `K'` /
`K' XOR ipad` / `K' XOR opad`, PBKDF2 intermediates, SM2 KDF
buffers, and PKCS#8 inner-key scratch.
## Roadmap
| Version | Scope |
|---|---|
| v0.2 (shipped) | SM4 + SM4-CBC, HMAC-SM3, PBKDF2-HMAC-SM3, SM2 encrypt/decrypt + GM/T 0009 ciphertext DER, dudect harness expansion to 11 targets. See [`CHANGELOG.md`](CHANGELOG.md) `[0.2.0]`. |
| v0.3 (shipped) | Reusable ASN.1 reader/writer subset; PEM, encrypted PKCS#8, X.509 SPKI, SEC1; full bidirectional gmssl interop (incl. SM2 sign/verify + SM2 encrypt/decrypt with PEM-wrapped keys + SM4-CBC); raw byte-concat ciphertext helpers (`C1\|\|C3\|\|C2` modern + legacy `C1\|\|C2\|\|C3` decrypt); streaming `HmacSm3` / `Sm4CbcEncryptor` / `Sm4CbcDecryptor` + in-crate `Hash`/`Mac`/`BlockCipher` traits; comb-table `mul_g` (~5× sign-side speedup); dudect harness expanded to 12 targets. See [`CHANGELOG.md`](CHANGELOG.md) `[0.3.0]`. |
| v0.4 (shipped) | `wasm32-unknown-unknown` build target; RustCrypto-trait fit (`digest::Digest` / `digest::Mac` / `cipher::BlockEncrypt`/`BlockDecrypt`) behind opt-in `digest-traits` / `cipher-traits` feature flags; bitsliced (table-less, gate-only) SM4 S-box behind the opt-in `sm4-bitsliced` feature; new `gmcrypto-c` workspace member exposing the SM2/SM3/SM4/HMAC/PBKDF2 surface as a C ABI (cdylib + staticlib + cbindgen-generated header). See [`CHANGELOG.md`](CHANGELOG.md) `[0.4.0]`. |
| v0.5.0 (shipped) | C-ABI completeness (streaming CBC + raw-byte SM2 ciphertext + caller-supplied RNG callback); `sm4-bitsliced-simd` feature-flag scaffolding — v0.5.0 ships no SIMD fast path (the feature transparently delegates to the v0.4 single-block bitslice); BREAKING ergonomic cleanup — workspace-wide `gmcrypto_core::Error`, `Sm2PrivateKey::new(U256)` → `from_scalar(U256)` (gated behind `crypto-bigint-scalar`) + always-on `from_bytes_be(&[u8; 32])` constructor, `std` feature removed. See [`CHANGELOG.md`](CHANGELOG.md) `[0.5.0]`. |
| v0.5.1 (shipped) | W4 phase 2 — new sibling crate `gmcrypto-simd` carrying an **AVX2 8-way packed bitsliced SM4 S-box** behind opt-in `sm4-bitsliced-simd`, with runtime CPU detection (`cpufeatures`) and silent scalar fallback on non-AVX2 hosts. v0.5.1's `tau` dispatch fed the AVX2 path with 7 wasted lanes; production throughput matched v0.4 single-block bitslice. Dudect calibration update — `ct_fn_invert` / `ct_fp_invert` moved to PR-smoke telemetry + 100K nightly gross-regression sentinel after a GH Actions `ubuntu-24.04` runner-image shift on 2026-05-12 raised the empirical noise floor; see `docs/v0.5-dudect-recalibration.md`. See [`CHANGELOG.md`](CHANGELOG.md) `[0.5.1]`. |
| v0.6.0 (shipped) | **W4 milestone close-out — the throughput-win release.** W4 phase 3: NEON 4-way bitsliced SM4 on `aarch64` (compile-time baseline) + AVX2 32-byte full-width packed S-box (`sbox_x32`) + `Sm4CbcDecryptor::process_chunk` SIMD fanout. Per round of the SM4 decrypt, batched blocks' `tau` inputs pack into one SIMD register (32 bytes on x86_64 / 8-block batch, 16 bytes on aarch64 / 4-block batch) — 32× fewer SIMD dispatches per 8-block batch than v0.5.1. CBC encryption stays single-block (chain-of-blocks defeats SIMD packing). New dudect target `ct_sm4_cbc_decrypt_fanout` (Q6.7) gates the fanout path at `\|tau\| < 0.20`. Exhaustive lane-position-shifted SIMD tests (8192 + 4096 cases) per Q6.8. **No public API changes; no breaking changes — additive only.** See [`CHANGELOG.md`](CHANGELOG.md) `[0.6.0]` and `docs/v0.6-scope.md`. |
| v0.7.0 (shipping) | **Cipher-mode surface expansion.** First version where v0.6's SIMD machinery is callable from user code outside the CBC-decrypt internal path. New: public length-flexible `Sm4Cipher::encrypt_blocks` / `decrypt_blocks` (W1; Q7.7); single-shot `sm4::mode_ctr::encrypt` / `decrypt` (W2; GM/T 0002-2012 §5.4); streaming `sm4::ctr_streaming::Sm4CtrCipher` (W3); new dudect target `ct_sm4_ctr_encrypt` (gates `\|tau\| < 0.20` on every cipher path). Plus the v0.8 AEAD scope doc (`docs/v0.7-aead-scope.md`, Q8.1–Q8.8 sign-off + v0.9 candidate Q-list). **No public API breakage — additive only.** See [`CHANGELOG.md`](CHANGELOG.md) `[0.7.0]`. |
| v0.8+ | AEAD per `docs/v0.7-aead-scope.md` — SM4-GCM + SM4-CCM with constant-time GHASH (CLMUL on `x86_64` via `gmcrypto-simd`, NEON `pmull` on `aarch64`, Karatsuba software fallback elsewhere) + constant-time tag compare + bidirectional gmssl interop + two new dudect targets. Behind opt-in `sm4-aead` feature flag. Other v0.8+ candidates: pinned / noise-isolated dudect runner; AVX-512 16-way `sbox_x64`; RustCrypto `digest = 0.11` / `cipher = 0.5` / `aead = 0.6` migration; `wasm-bindgen-test` KAT runner; streaming AEAD. Each lands behind its own scope-doc cycle. |
| v1.0 | API stabilization. |
## Quick-start
```rust
use gmcrypto_core::sm2::{
sign_with_id, verify_with_id, Sm2PrivateKey, Sm2PublicKey, DEFAULT_SIGNER_ID,
};
use getrandom::SysRng;
use hex_literal::hex;
use rand_core::UnwrapErr;
// v0.5 W5 — `from_bytes_be` is the recommended public constructor
// (always-on, doesn't expose `crypto_bigint::U256` to callers).
let d_be: [u8; 32] = hex!(
"3945208F7B2144B13F36E38AC6D39F95889393692860B51A42FB81EF4DF7C5B8"
);
let key = Sm2PrivateKey::from_bytes_be(&d_be).expect("d in [1, n-2]");
let public = Sm2PublicKey::from_point(key.public_key());
let mut rng = UnwrapErr(SysRng);
let sig = sign_with_id(&key, DEFAULT_SIGNER_ID, b"hello", &mut rng).unwrap();
assert!(verify_with_id(&public, DEFAULT_SIGNER_ID, b"hello", &sig));
```
## Threat model
See [`SECURITY.md`](SECURITY.md). Briefly: server-side use, dedicated host,
operator-trusted, network MITM in scope, side-channel attacks beyond what the
dudect harness covers are NOT in scope.
## Build & test
```bash
cargo test --workspace # unit + integration
cargo bench --bench timing_leaks --features crypto-bigint-scalar # local timing harness (~75s)
DUDECT_SAMPLES=10000 cargo bench --bench timing_leaks --features crypto-bigint-scalar # match CI smoke budget
```
`gmssl` interop test (gated; install [`gmssl`](https://github.com/guanzhi/GmSSL)
v3.1.1 to enable):
```bash
GMCRYPTO_GMSSL=1 cargo test --test interop_gmssl
```
## wasm32 support
`gmcrypto-core` builds on `wasm32-unknown-unknown` as of v0.4. CI gates
both stable and MSRV (1.85) builds on the target.
```bash
rustup target add wasm32-unknown-unknown
cargo build -p gmcrypto-core --target wasm32-unknown-unknown --no-default-features
```
The crate is `no_std + alloc` only and does NOT pull `getrandom`'s
`wasm_js` backend or `wasm-bindgen` / `js-sys` into its default dep
graph. Wasm callers wire their own `rand_core::Rng` impl — typically
by enabling `getrandom`'s `wasm_js` feature in *their* `Cargo.toml`:
```toml
[dependencies]
gmcrypto-core = "0.7"
rand_core = { version = "0.10", default-features = false }
getrandom = { version = "0.4", default-features = false, features = ["wasm_js"] }
```
```rust
use gmcrypto_core::sm2::{sign_with_id, Sm2PrivateKey, DEFAULT_SIGNER_ID};
use rand_core::UnwrapErr;
use getrandom::SysRng;
let mut rng = UnwrapErr(SysRng); // wasm_js-backed when targeting wasm32
let sig = sign_with_id(&priv_key, DEFAULT_SIGNER_ID, b"msg", &mut rng).unwrap();
```
A `wasm-bindgen-test`-driven test runner (running KAT vectors under
Node or a headless browser) is post-v0.4 — v0.4 ships the build-target
gate only.
## License
Apache-2.0. See [`LICENSE`](LICENSE).
Some reference outputs use the upstream [`gmssl`](https://github.com/guanzhi/GmSSL)
tool. This project is independent of that project.