llm-message-hash
Stable canonical hash of LLM request/message structures.
Two semantically-identical Anthropic requests can produce different
sha256(serde_json::to_string(&req)) results because JSON key order
isn't guaranteed and fields like cache_control change bytes without
changing semantics. This crate walks the value tree, sorts keys
recursively, drops configurable fields, and sha256s the canonical bytes.
Useful for prompt-cache lookups, idempotency keys, and dedupe.
Install
[]
= "0.1"
= "1"
Use
Default (no fields dropped):
use json;
use hash_canonical_hex;
let a = json!;
let b = json!;
assert_eq!;
Per-provider preset (drops cache_control, response-only fields, etc.):
use ;
let with_cc = json!;
let without_cc = json!;
let h1 = hash_canonical_hex_with;
let h2 = hash_canonical_hex_with;
assert_eq!;
Built-in presets: HashOpts::anthropic(), HashOpts::openai(),
HashOpts::bedrock(), HashOpts::gemini(). Each drops the response-side
metadata that varies per call (e.g. id, created, usage,
finish_reason) plus provider-specific request fields that don't change
semantics (e.g. cache_control for Anthropic).
Extend any preset fluently:
let opts = anthropic.ignore;
What it does NOT do
- No tokenization. Hash is over the structure, not the token count.
- No semantic equivalence.
"hi"and"Hi"hash differently. So do42and42.0. - No streaming hash. Pass a complete
serde_json::Value.
Output
hash_canonical(v) and hash_canonical_with(v, opts) return [u8; 32].
The _hex variants return a 64-char lowercase hex String.
The canonical bytes are emitted via a private writer; you can't extract
them directly. If you need to debug, parse the output of
serde_json::to_string after manually sorting keys.
License
MIT OR Apache-2.0
Composes with agentidemp (idempotency keys)
and cachebench (prompt-cache
hit-ratio measurement).