Skip to main content

Crate fast_md5

Crate fast_md5 

Source
Expand description

§fast-md5

A small MD5 implementation with hand-written assembly cores for x86_64 and aarch64, plus a portable Rust fallback for every other target. The assembly was ported from animetosho/md5-optimisation (released into the public domain by the author; see discussion #4).

On Apple Silicon and modern x86_64 the per-block compression is within ~1 % of AWS-LC’s hand-tuned C. End-to-end in HMAC-MD5 workloads (e.g. RADIUS Message-Authenticator), the all-Rust call path beats AWS-LC by ~18 % thanks to cross-call inlining and a precomputed-state HMAC structure (see below).

§Security warning

MD5 is cryptographically broken — it is trivially vulnerable to collision attacks and must not be used for digital signatures, certificate fingerprints, or any other security-sensitive integrity check. HMAC-MD5 is also discouraged for new designs; this crate exposes HmacMd5 solely to support legacy protocols (RADIUS, CHAP, certain SASL/SIP digests) and non-cryptographic uses such as deduplication and checksumming.

§Quick start

let digest = fast_md5::digest(b"The quick brown fox jumps over the lazy dog");
assert_eq!(
    digest,
    [
        0x9e, 0x10, 0x7d, 0x9d, 0x37, 0x2b, 0xb6, 0x82,
        0x6b, 0xd8, 0x1d, 0x35, 0x42, 0xa4, 0x19, 0xd6,
    ],
);

Streaming MD5:

let mut h = fast_md5::Md5::new();
h.update(b"The quick brown fox ");
h.update(b"jumps over the lazy dog");
assert_eq!(
    h.finalize(),
    [
        0x9e, 0x10, 0x7d, 0x9d, 0x37, 0x2b, 0xb6, 0x82,
        0x6b, 0xd8, 0x1d, 0x35, 0x42, 0xa4, 0x19, 0xd6,
    ],
);

Streaming HMAC-MD5 (RFC 2104):

let mut h = fast_md5::HmacMd5::new(b"Jefe");
h.update(b"what do ya want ");
h.update(b"for nothing?");
assert_eq!(
    h.finalize(),
    [
        0x75, 0x0c, 0x78, 0x3e, 0x6a, 0xb0, 0xb5, 0x03,
        0xea, 0xa8, 0x6e, 0x31, 0x0a, 0x5d, 0xb7, 0x38,
    ],
);

§no_std

The crate is #![no_std] and performs no heap allocation. All buffers (the 64-byte block buffer, the 4-word state, the HMAC ipad/opad scratch) live inline in the user’s value or on the caller’s stack.

§Cargo features

  • force-fallback — disable the architecture-specific assembly backends and route transform through the portable Rust fallback implementation on every target. Intended for CI coverage of the fallback on assembly hosts and for downstream debugging; not recommended for production use on x86_64 / aarch64 (the fallback is correct but materially slower).

§Design notes

The crate is small enough to read end-to-end, but a few choices are worth flagging because they’re load-bearing for performance and were arrived at empirically.

§Architecture dispatch

transform is a cfg-dispatched shim that routes to one of three backends, all with identical semantics:

  • x86_64: a single monolithic asm! block per 64-byte block, ported from animetosho’s “NoLEA” sequence. It uses add chains instead of lea to keep the critical path on the integer ALUs, which is faster on every x86_64 microarchitecture from Haswell onward.
  • aarch64: per-round Rust expressions with a one-line asm! block for the rotate. LLVM produces nearly the same code as a monolithic asm block here because AArch64’s three-operand ALU (no destination clobber) and free shifts give the register allocator more freedom; the inline ror exists only to pin a concrete 32-bit register at each round boundary, which guides scheduling.
  • fallback: portable safe Rust used on all other targets and compiled (but not linked) under cfg(test) everywhere, so its transform can be cross-checked against the assembly backends on every supported host.

§Inlining policy

  • The architecture-specific transform is #[inline(always)] — it must fuse with Md5::update’s per-block loop or LLVM leaves the state in memory between blocks.
  • Md5::update and Md5::finalize are plain #[inline] — their bodies are large once transform inlines (~700 instructions on aarch64), and forcing inline at every call site measurably regresses I-cache-bound workloads (HMAC chains, etc.). Plain #[inline] exposes MIR for cross-crate inlining and lets LLVM’s size heuristic decide.
  • Md5::new and digest are #[inline(always)] — trivial wrappers; forcing inline lets the IV propagate as register immediates and lets known-length one-shots collapse finalize’s padding into constants.

§Block buffer uses MaybeUninit

Md5’s 64-byte partial-block buffer is [MaybeUninit<u8>; 64] rather than [u8; 64]. Construction is therefore a no-op (no 64-byte zero fill on every Md5::new()), and bytes are only read after they have been written. This matters when the caller does many short hashes per second — RADIUS again — because the cost of initialising a Md5 becomes negligible relative to the compression itself.

§HmacMd5 precomputes ipad/opad states

HmacMd5::new runs the ipad and opad block compressions immediately and stores only the resulting [u32; 4] states (32 bytes total). Steady-state, update is pure delegation to the inner Md5, and finalize costs exactly two extra compressions on top of the message work (inner tail + outer tail). This is the same shape as AWS-LC’s HMAC_CTX and is what gives the ~18 % end-to-end win in HMAC-bound workloads.

§No volatile key zeroization

Neither Md5 nor HmacMd5 performs write_volatile-style scrubbing of stack scratch on drop. The transient ipad/opad XOR blocks inside HmacMd5::new are recoverable only during the lifetime of that call, and the persistent struct holds digests of the key (one-way) rather than the key itself. Callers with stricter threat models (FIPS, processes that emit core dumps, protocols where memory-disclosure bugs are realistic) should wrap their key in zeroize::Zeroizing at the call site; this crate cannot meaningfully protect a key that the caller is already holding in long-lived memory.

Skipping the volatile writes also keeps the hot path free of optimization barriers, which is part of why the all-Rust HMAC path beats the FFI’d C implementations.

Structs§

HmacMd5
Streaming HMAC-MD5 (RFC 2104).
Md5
Streaming MD5 hasher.

Constants§

BLOCK_SIZE
MD5 compresses 64-byte blocks.
DIGEST_LENGTH
MD5 produces a 16-byte digest.

Functions§

digest
One-shot convenience: hash data and return the digest.
transform
Compress one 64-byte block into the running MD5 state.