llm-message-hash

Stable canonical hash of LLM request/message structures.

Two semantically-identical Anthropic requests can produce different sha256(serde_json::to_string(&req)) results because JSON key order isn't guaranteed and fields like cache_control change bytes without changing semantics. This crate walks the value tree, sorts keys recursively, drops configurable fields, and sha256s the canonical bytes.

Useful for prompt-cache lookups, idempotency keys, and dedupe.

Install

[dependencies]
llm-message-hash = "0.1"
serde_json = "1"

Use

Default (no fields dropped):

use serde_json::json;
use llm_message_hash::hash_canonical_hex;

let a = json!({"model": "claude", "messages": [{"role": "user", "content": "hi"}]});
let b = json!({"messages": [{"content": "hi", "role": "user"}], "model": "claude"});

assert_eq!(hash_canonical_hex(&a), hash_canonical_hex(&b));

Per-provider preset (drops cache_control, response-only fields, etc.):

use llm_message_hash::{hash_canonical_hex_with, HashOpts};

let with_cc = json!({
    "messages": [{
        "role": "user",
        "content": [{"type": "text", "text": "hi", "cache_control": {"type": "ephemeral"}}],
    }],
});
let without_cc = json!({
    "messages": [{
        "role": "user",
        "content": [{"type": "text", "text": "hi"}],
    }],
});

let h1 = hash_canonical_hex_with(&with_cc, &HashOpts::anthropic());
let h2 = hash_canonical_hex_with(&without_cc, &HashOpts::anthropic());
assert_eq!(h1, h2);

Built-in presets: HashOpts::anthropic(), HashOpts::openai(), HashOpts::bedrock(), HashOpts::gemini(). Each drops the response-side metadata that varies per call (e.g. id, created, usage, finish_reason) plus provider-specific request fields that don't change semantics (e.g. cache_control for Anthropic).

Extend any preset fluently:

let opts = HashOpts::anthropic().ignore("metadata");

What it does NOT do

No tokenization. Hash is over the structure, not the token count.
No semantic equivalence. "hi" and "Hi" hash differently. So do 42 and 42.0.
No streaming hash. Pass a complete serde_json::Value.

Output

hash_canonical(v) and hash_canonical_with(v, opts) return [u8; 32]. The _hex variants return a 64-char lowercase hex String.

The canonical bytes are emitted via a private writer; you can't extract them directly. If you need to debug, parse the output of serde_json::to_string after manually sorting keys.

License

MIT OR Apache-2.0

Composes with agentidemp (idempotency keys) and cachebench (prompt-cache hit-ratio measurement).

llm-message-hash 0.1.0

llm-message-hash

Install

Use

What it does NOT do

Output

License