llm-message-hash 0.1.0

Stable canonical hash of LLM request/message structures. Recursive key-sorting JSON canonicalization + sha256, with per-provider ignore-lists so semantically-equal Anthropic/OpenAI/Bedrock requests produce the same hash. Useful for cache keys and idempotency.
Documentation
  • Coverage
  • 100%
    7 out of 7 items documented1 out of 1 items with examples
  • Size
  • Source code size: 37.84 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 419.76 kB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 27s Average build duration of successful builds.
  • all releases: 27s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Homepage
  • MukundaKatta/llm-message-hash
    0 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • MukundaKatta

llm-message-hash

Crates.io Documentation CI License

Stable canonical hash of LLM request/message structures.

Two semantically-identical Anthropic requests can produce different sha256(serde_json::to_string(&req)) results because JSON key order isn't guaranteed and fields like cache_control change bytes without changing semantics. This crate walks the value tree, sorts keys recursively, drops configurable fields, and sha256s the canonical bytes.

Useful for prompt-cache lookups, idempotency keys, and dedupe.

Install

[dependencies]
llm-message-hash = "0.1"
serde_json = "1"

Use

Default (no fields dropped):

use serde_json::json;
use llm_message_hash::hash_canonical_hex;

let a = json!({"model": "claude", "messages": [{"role": "user", "content": "hi"}]});
let b = json!({"messages": [{"content": "hi", "role": "user"}], "model": "claude"});

assert_eq!(hash_canonical_hex(&a), hash_canonical_hex(&b));

Per-provider preset (drops cache_control, response-only fields, etc.):

use llm_message_hash::{hash_canonical_hex_with, HashOpts};

let with_cc = json!({
    "messages": [{
        "role": "user",
        "content": [{"type": "text", "text": "hi", "cache_control": {"type": "ephemeral"}}],
    }],
});
let without_cc = json!({
    "messages": [{
        "role": "user",
        "content": [{"type": "text", "text": "hi"}],
    }],
});

let h1 = hash_canonical_hex_with(&with_cc, &HashOpts::anthropic());
let h2 = hash_canonical_hex_with(&without_cc, &HashOpts::anthropic());
assert_eq!(h1, h2);

Built-in presets: HashOpts::anthropic(), HashOpts::openai(), HashOpts::bedrock(), HashOpts::gemini(). Each drops the response-side metadata that varies per call (e.g. id, created, usage, finish_reason) plus provider-specific request fields that don't change semantics (e.g. cache_control for Anthropic).

Extend any preset fluently:

let opts = HashOpts::anthropic().ignore("metadata");

What it does NOT do

  • No tokenization. Hash is over the structure, not the token count.
  • No semantic equivalence. "hi" and "Hi" hash differently. So do 42 and 42.0.
  • No streaming hash. Pass a complete serde_json::Value.

Output

hash_canonical(v) and hash_canonical_with(v, opts) return [u8; 32]. The _hex variants return a 64-char lowercase hex String.

The canonical bytes are emitted via a private writer; you can't extract them directly. If you need to debug, parse the output of serde_json::to_string after manually sorting keys.

License

MIT OR Apache-2.0

Composes with agentidemp (idempotency keys) and cachebench (prompt-cache hit-ratio measurement).