gmcrypto-core 0.5.1

Constant-time-designed pure-Rust SM2/SM3 primitives (no_std + alloc) with an in-CI dudect timing-leak regression harness
Documentation
# gm-crypto-rs

Constant-time-designed pure-Rust SM2 / SM3 / SM4 SDK for Chinese national
cryptography (GB/T 32905 / 32918 / 32907 / GM/T 0009). Sign / verify,
public-key encrypt / decrypt, SM4-CBC, HMAC-SM3, PBKDF2-HMAC-SM3 — all
secret-touching paths guarded by an in-CI `dudect-bencher`
detectable-leak regression harness.

[![Crates.io](https://img.shields.io/crates/v/gmcrypto-core.svg)](https://crates.io/crates/gmcrypto-core)
[![Documentation](https://docs.rs/gmcrypto-core/badge.svg)](https://docs.rs/gmcrypto-core)
[![License](https://img.shields.io/crates/l/gmcrypto-core.svg)](https://crates.io/crates/gmcrypto-core)

**Personal project notice:** not affiliated with, endorsed by, sponsored by, or
certified by any upstream cryptography project, payment gateway, standards body,
or vendor.

## What this is

A small, auditable, pure-Rust SM2 / SM3 / SM4 SDK whose central
differentiating commitment is that secret-touching code paths are
**constant-time-designed and guarded by an in-CI [`dudect-bencher`](https://docs.rs/dudect-bencher/)
detectable-leak regression harness** with 12 gates at `|tau| < 0.20`.

The harness reports timing-leak detection events. **It does not prove
constant-time.** Low `|tau|` values mean the test could not detect a leak with
the budget given, not that no leak exists. Language taken directly from
`dudect-bencher`'s own docs.

v0.4's harness covers the same 12 secret-touching code paths as v0.3 —
no new dudect targets in v0.4. The W3 bitsliced SM4 S-box runs under
its own matrix entry in CI (`features=sm4-bitsliced`), so the
`ct_sm4_key_schedule` and `ct_sm4_encrypt_block` targets are gated
under both feature configurations. The harness covers: SM2 sign (split
by both private key `d` and nonce `k` magnitude, with both retry
nonces class-tied), SM2 decrypt (split by recipient `d_B`), SM4 key
schedule and encrypt (split by master key, under both the default
linear-scan and v0.4 bitsliced S-box), HMAC-SM3 (split by key),
encrypted-PKCS#8 decrypt (split by password bytes — both classes' blobs
valid for their class's password so both succeed via identical control
flow), plus direct `Fn::invert` and `Fp::invert` diagnostics. The
`ct_sign_k_class` target closes v0.1's structural blind spot to
nonce-only leaks. The `crypto-bigint 0.6 → 0.7.3` upgrade resolved the
v0.6-era `ConstMontyForm::invert` leak directly: at 100K samples on
0.7.3 both direct invert diagnostics measure under `|tau| ≈ 0.01`, two
orders of magnitude below the gate. See [`SECURITY.md`](SECURITY.md)
for the full posture.

The differentiator vs. existing Rust SM2 crates (notably
[`RustCrypto/sm2`](https://docs.rs/sm2/), which already aims for constant-time
secret-dependent operations in its design) is **the in-CI regression gate**, not
the design intent in isolation.

## What this isn't

- Not a TLS/TLCP implementation.
- Not SM9, ZUC, post-quantum.
- Not an HSM/SDF/SKF integration.
- Not a certified cryptographic module.
- Not constant-time on CPUs with data-dependent multiply latencies (some older
  x86, some embedded).
- Not a comprehensive SM-crypto library yet — see the milestone roadmap.

## v0.4 scope (shipped)

Builds on v0.3 (PEM, encrypted PKCS#8, SPKI, SEC1, bidirectional gmssl
interop, raw byte-concat ciphertext, streaming `HmacSm3` and
`Sm4Cbc{En,De}cryptor`, in-crate `Hash` / `Mac` / `BlockCipher`
traits, comb-table `mul_g`). v0.4 adds the platform-fit and
ergonomics work v0.3 deferred:

- **`wasm32-unknown-unknown` build target** — W1. CI gates both stable
  and MSRV (1.85) on the target. The crate is `no_std + alloc` only
  and does NOT pull `getrandom`'s `wasm_js` backend or
  `wasm-bindgen` / `js-sys` into its default dep graph; wasm callers
  wire their own `rand_core::Rng` impl. See [`wasm32 support`](#wasm32-support)
  below.
- **RustCrypto-trait fit** — W2. Opt-in via the `digest-traits` and
  `cipher-traits` features. Implements `digest::Digest` for `Sm3`,
  `digest::Mac` for `HmacSm3`, and
  `cipher::{BlockEncrypt, BlockDecrypt, KeyInit}` for `Sm4Cipher`.
  Default-features build is unchanged: no extra runtime deps. Two
  separate flags so callers needing only one half don't pay for both.
- **Bitsliced SM4 S-box** — W3. Opt-in via the `sm4-bitsliced`
  feature. Boyar-Peralta-style Itoh-Tsujii inversion in GF(2^8)
  plus two affine transformations — table-less, gate-only,
  constant-time by construction. Byte-identical output to the
  default linear-scan path (exhaustive 256-input equivalence test).
  Single-block only in v0.4; multi-block SIMD-packed bitslicing
  deferred to v0.5+.
- **`gmcrypto-c` C ABI crate** — W4. New workspace member building
  `cdylib` + `staticlib` + `rlib`; cbindgen-generated C header at
  [`crates/gmcrypto-c/include/gmcrypto.h`](crates/gmcrypto-c/include/gmcrypto.h)
  is committed and CI gates header drift via `git diff --exit-code`.
  31 FFI entry points covering SM3 (single-shot + streaming), HMAC-SM3
  (with constant-time `verify`), PBKDF2-HMAC-SM3, SM4 (block + CBC),
  and SM2 (keys + sign/verify + encrypt/decrypt + PKCS#8). Opaque
  handle pattern (`Box::into_raw` + `ZeroizeOnDrop`); every error
  path collapses to a single `GMCRYPTO_FAILED` return code per the
  failure-mode invariant. `unsafe_code = "forbid"` is workspace-wide
  on `gmcrypto-core` (no change); `gmcrypto-c` necessarily uses
  `unsafe` for raw-pointer FFI primitives, with `// SAFETY:` comments
  on every block.

Everything v0.3 shipped is unchanged:

- Reusable strict-canonical DER reader / writer subset
  (`gmcrypto_core::asn1::{reader, writer, oid}`).
- PEM + encrypted PKCS#8 + X.509 SPKI + SEC1 codecs
  (`gmcrypto_core::{pem, pkcs8, spki, sec1}`).
- Full bidirectional gmssl 3.1.1 interop (SM2 sign / verify, SM2
  encrypt / decrypt, SM4-CBC). Gated on `GMCRYPTO_GMSSL=1`.
- Raw byte-concat SM2 ciphertext helpers
  (`gmcrypto_core::sm2::raw_ciphertext`): `C1 || C3 || C2`
  emit + decode; legacy `C1 || C2 || C3` decrypt-only.
- Streaming `HmacSm3` + `Sm4Cbc{En,De}cryptor`. In-crate
  `Hash` / `Mac` / `BlockCipher` traits (`gmcrypto_core::traits`).
- Comb-table `mul_g` (~5× sign-side speedup). 64 sub-tables of 16
  entries each, lazily built once per process via `spin::Once`.

Everything v0.2 shipped is unchanged:

- SM3 hash function (`#![no_std]` + `alloc`).
- SM2 sign / verify with custom signer ID (default `1234567812345678` per GM/T 0009).
- SM2 public-key encrypt / decrypt with GM/T 0009-2012 ciphertext DER
  (`SEQUENCE { x, y, hash, ciphertext }`). Invalid-curve attack defense
  via on-curve check on `C1` before scalar mult; non-branching
  KDF-zero detection so a chosen-ciphertext attacker cannot distinguish
  it from a normal MAC failure.
- SM4 block cipher (GB/T 32907-2016) and SM4-CBC (PKCS#7 padding,
  caller-supplied unpredictable IV per NIST SP 800-38A Appendix C).
  Constant-time-designed `subtle` linear-scan S-box (~1-2M blocks/s);
  opt-in bitsliced (table-less, gate-only) S-box via the
  `sm4-bitsliced` feature (v0.4 W3). PKCS#7 strip uses a
  constant-time scan over the final block; `decrypt` collapses every
  failure mode to a single `None` against padding-oracle attacks.
- HMAC-SM3 per RFC 2104, gmssl-cross-validated KAT vectors. Hash-first
  long-key path. v0.3 adds the streaming `HmacSm3` shape alongside
  single-shot `hmac_sm3`.
- PBKDF2-HMAC-SM3 per RFC 8018 §5.2. Caller-supplied output buffer
  (no internal allocation, no iteration-count default).
- Constant-time-designed `Fp` and `Fn` field arithmetic via
  `crypto-bigint = 0.7.3`.
- Renes-Costello-Batina complete addition formulas for the SM2 curve (a=-3 specialized).
- Fixed-base (v0.3 comb-table) and variable-base scalar multiplication,
  both constant-time-designed with `subtle::ConditionallySelectable`
  linear-scan table lookup.
- Fixed-K masked-select signing retry: the retry loop runs `K=2` iterations
  unconditionally, regardless of which iteration produced a valid signature.
  The constant-time contract holds for any RNG that respects `CryptoRng`;
  pathological RNGs cannot leak the secret via observable retry count.
- Strict canonical ASN.1 DER for `SEQUENCE { r, s }` (signatures), the
  GM/T 0009 SM2 ciphertext SEQUENCE, and all v0.3 PEM / PKCS#8 / SPKI
  / SEC1 wire formats. Rejects non-canonical leading-zero padding,
  sign-bit-set first bytes, empty content, and (for ciphertext
  coordinates) values `≥ p`.
- KAT vectors from GB/T 32905-2016 (SM3), GB/T 32918.2-2017 / .5-2017
  (SM2), GB/T 32907-2016 Appendix A.1 (SM4 single-block + 1M-round),
  GM/T 0042-2015 (HMAC-SM3), GM/T 0091-2020 (PBKDF2-HMAC-SM3).
- `gmssl` CLI cross-validation for HMAC-SM3, PBKDF2-HMAC-SM3, and
  (new in v0.3) SM2 sign/verify, SM2 encrypt/decrypt, and SM4-CBC
  in both directions. Gated on `GMCRYPTO_GMSSL=1`.
- `dudect-bencher` harness with **12 targets** at `|tau| < 0.20`,
  matrix-run under both `features=default` and `features=sm4-bitsliced`
  in v0.4 — PR-smoke 10⁴ samples; nightly 10⁵ samples (more samples =
  tighter empirical confidence at the same threshold). Plus a
  deliberately-leaky negative control that proves the harness can
  detect leaks.
- Failure-mode invariant: every `Result`-returning public API uses
  the workspace-wide `gmcrypto_core::Error` (single `Failed` variant,
  `#[non_exhaustive]`); per-module aliases `sm2::Error`, `pem::Error`,
  `pkcs8::Error` all point at the same type. `verify_with_id` returns
  `bool`; DER decode returns `Option`. Defense against padding-oracle,
  malleability, and invalid-curve attacks.
- Zeroization on private keys, SM4 round keys, HMAC `K'` /
  `K' XOR ipad` / `K' XOR opad`, PBKDF2 intermediates, SM2 KDF
  buffers, and PKCS#8 inner-key scratch.

## Roadmap

| Version | Scope |
|---|---|
| v0.2 (shipped) | SM4 + SM4-CBC, HMAC-SM3, PBKDF2-HMAC-SM3, SM2 encrypt/decrypt + GM/T 0009 ciphertext DER, dudect harness expansion to 11 targets. See [`CHANGELOG.md`](CHANGELOG.md) `[0.2.0]`. |
| v0.3 (shipped) | Reusable ASN.1 reader/writer subset; PEM, encrypted PKCS#8, X.509 SPKI, SEC1; full bidirectional gmssl interop (incl. SM2 sign/verify + SM2 encrypt/decrypt with PEM-wrapped keys + SM4-CBC); raw byte-concat ciphertext helpers (`C1\|\|C3\|\|C2` modern + legacy `C1\|\|C2\|\|C3` decrypt); streaming `HmacSm3` / `Sm4CbcEncryptor` / `Sm4CbcDecryptor` + in-crate `Hash`/`Mac`/`BlockCipher` traits; comb-table `mul_g` (~5× sign-side speedup); dudect harness expanded to 12 targets. See [`CHANGELOG.md`](CHANGELOG.md) `[0.3.0]`. |
| v0.4 (shipped) | `wasm32-unknown-unknown` build target; RustCrypto-trait fit (`digest::Digest` / `digest::Mac` / `cipher::BlockEncrypt`/`BlockDecrypt`) behind opt-in `digest-traits` / `cipher-traits` feature flags; bitsliced (table-less, gate-only) SM4 S-box behind the opt-in `sm4-bitsliced` feature; new `gmcrypto-c` workspace member exposing the SM2/SM3/SM4/HMAC/PBKDF2 surface as a C ABI (cdylib + staticlib + cbindgen-generated header). See [`CHANGELOG.md`](CHANGELOG.md) `[0.4.0]`. |
| v0.5.0 (shipped) | C-ABI completeness (streaming CBC + raw-byte SM2 ciphertext + caller-supplied RNG callback); `sm4-bitsliced-simd` feature-flag scaffolding — v0.5.0 ships no SIMD fast path (the feature transparently delegates to the v0.4 single-block bitslice); BREAKING ergonomic cleanup — workspace-wide `gmcrypto_core::Error`, `Sm2PrivateKey::new(U256)` → `from_scalar(U256)` (gated behind `crypto-bigint-scalar`) + always-on `from_bytes_be(&[u8; 32])` constructor, `std` feature removed. See [`CHANGELOG.md`](CHANGELOG.md) `[0.5.0]`. |
| v0.5.1 (shipping) | W4 phase 2 — new sibling crate `gmcrypto-simd` carrying an **AVX2 8-way packed bitsliced SM4 S-box** behind opt-in `sm4-bitsliced-simd`, with runtime CPU detection (`cpufeatures`) and silent scalar fallback on non-AVX2 hosts. **v0.5.1's `tau` dispatch still feeds the AVX2 path with the single input replicated across 8 lanes (7 wasted), so production throughput matches the v0.4 single-block bitslice** until W4 phase 3 (NEON + `Sm4CbcDecryptor::process_chunk` SIMD fanout) widens the call site to real multi-block batches; phase 3 deferred to v0.5.2 or v0.6. Dudect calibration update — `ct_fn_invert` / `ct_fp_invert` moved to PR-smoke telemetry + 100K nightly gross-regression sentinel after a GH Actions `ubuntu-24.04` runner-image shift on 2026-05-12 raised the empirical noise floor; see `docs/v0.5-dudect-recalibration.md`. No public-API changes; no breaking changes — additive only. See [`CHANGELOG.md`](CHANGELOG.md) `[0.5.1]`. |
| v0.6 | W4 phase 3 — NEON 4-way bitsliced SM4 on `aarch64` + `Sm4CbcDecryptor::process_chunk` SIMD fanout (where the 8-way / 4-way packing actually carries real parallel data); RustCrypto `digest` 0.11 / `cipher` 0.5 migration when those lines stabilize; `wasm-bindgen-test`-driven KAT runner. |
| v1.0 | API stabilization. |

## Quick-start

```rust
use gmcrypto_core::sm2::{
    sign_with_id, verify_with_id, Sm2PrivateKey, Sm2PublicKey, DEFAULT_SIGNER_ID,
};
use getrandom::SysRng;
use hex_literal::hex;
use rand_core::UnwrapErr;

// v0.5 W5 — `from_bytes_be` is the recommended public constructor
// (always-on, doesn't expose `crypto_bigint::U256` to callers).
let d_be: [u8; 32] = hex!(
    "3945208F7B2144B13F36E38AC6D39F95889393692860B51A42FB81EF4DF7C5B8"
);
let key = Sm2PrivateKey::from_bytes_be(&d_be).expect("d in [1, n-2]");
let public = Sm2PublicKey::from_point(key.public_key());

let mut rng = UnwrapErr(SysRng);
let sig = sign_with_id(&key, DEFAULT_SIGNER_ID, b"hello", &mut rng).unwrap();
assert!(verify_with_id(&public, DEFAULT_SIGNER_ID, b"hello", &sig));
```

## Threat model

See [`SECURITY.md`](SECURITY.md). Briefly: server-side use, dedicated host,
operator-trusted, network MITM in scope, side-channel attacks beyond what the
dudect harness covers are NOT in scope.

## Build & test

```bash
cargo test --workspace                                                          # unit + integration
cargo bench --bench timing_leaks --features crypto-bigint-scalar                # local timing harness (~75s)
DUDECT_SAMPLES=10000 cargo bench --bench timing_leaks --features crypto-bigint-scalar  # match CI smoke budget
```

`gmssl` interop test (gated; install [`gmssl`](https://github.com/guanzhi/GmSSL)
v3.1.1 to enable):

```bash
GMCRYPTO_GMSSL=1 cargo test --test interop_gmssl
```

## wasm32 support

`gmcrypto-core` builds on `wasm32-unknown-unknown` as of v0.4. CI gates
both stable and MSRV (1.85) builds on the target.

```bash
rustup target add wasm32-unknown-unknown
cargo build -p gmcrypto-core --target wasm32-unknown-unknown --no-default-features
```

The crate is `no_std + alloc` only and does NOT pull `getrandom`'s
`wasm_js` backend or `wasm-bindgen` / `js-sys` into its default dep
graph. Wasm callers wire their own `rand_core::Rng` impl — typically
by enabling `getrandom`'s `wasm_js` feature in *their* `Cargo.toml`:

```toml
[dependencies]
gmcrypto-core = "0.4"
rand_core = { version = "0.10", default-features = false }
getrandom = { version = "0.4", default-features = false, features = ["wasm_js"] }
```

```rust
use gmcrypto_core::sm2::{sign_with_id, Sm2PrivateKey, DEFAULT_SIGNER_ID};
use rand_core::UnwrapErr;
use getrandom::SysRng;

let mut rng = UnwrapErr(SysRng); // wasm_js-backed when targeting wasm32
let sig = sign_with_id(&priv_key, DEFAULT_SIGNER_ID, b"msg", &mut rng).unwrap();
```

A `wasm-bindgen-test`-driven test runner (running KAT vectors under
Node or a headless browser) is post-v0.4 — v0.4 ships the build-target
gate only.

## License

Apache-2.0. See [`LICENSE`](LICENSE).

Some reference outputs use the upstream [`gmssl`](https://github.com/guanzhi/GmSSL)
tool. This project is independent of that project.