# llm-message-hash
[](https://crates.io/crates/llm-message-hash)
[](https://docs.rs/llm-message-hash)
[](https://github.com/MukundaKatta/llm-message-hash/actions/workflows/ci.yml)
[](https://crates.io/crates/llm-message-hash)
**Stable canonical hash of LLM request/message structures.**
Two semantically-identical Anthropic requests can produce different
`sha256(serde_json::to_string(&req))` results because JSON key order
isn't guaranteed and fields like `cache_control` change bytes without
changing semantics. This crate walks the value tree, sorts keys
recursively, drops configurable fields, and sha256s the canonical bytes.
Useful for prompt-cache lookups, idempotency keys, and dedupe.
## Install
```toml
[dependencies]
llm-message-hash = "0.1"
serde_json = "1"
```
## Use
Default (no fields dropped):
```rust
use serde_json::json;
use llm_message_hash::hash_canonical_hex;
let a = json!({"model": "claude", "messages": [{"role": "user", "content": "hi"}]});
let b = json!({"messages": [{"content": "hi", "role": "user"}], "model": "claude"});
assert_eq!(hash_canonical_hex(&a), hash_canonical_hex(&b));
```
Per-provider preset (drops cache_control, response-only fields, etc.):
```rust
use llm_message_hash::{hash_canonical_hex_with, HashOpts};
let with_cc = json!({
"messages": [{
"role": "user",
"content": [{"type": "text", "text": "hi", "cache_control": {"type": "ephemeral"}}],
}],
});
let without_cc = json!({
"messages": [{
"role": "user",
"content": [{"type": "text", "text": "hi"}],
}],
});
let h1 = hash_canonical_hex_with(&with_cc, &HashOpts::anthropic());
let h2 = hash_canonical_hex_with(&without_cc, &HashOpts::anthropic());
assert_eq!(h1, h2);
```
Built-in presets: `HashOpts::anthropic()`, `HashOpts::openai()`,
`HashOpts::bedrock()`, `HashOpts::gemini()`. Each drops the response-side
metadata that varies per call (e.g. `id`, `created`, `usage`,
`finish_reason`) plus provider-specific request fields that don't change
semantics (e.g. `cache_control` for Anthropic).
Extend any preset fluently:
```rust
let opts = HashOpts::anthropic().ignore("metadata");
```
## What it does NOT do
- No tokenization. Hash is over the *structure*, not the token count.
- No semantic equivalence. `"hi"` and `"Hi"` hash differently. So do
`42` and `42.0`.
- No streaming hash. Pass a complete `serde_json::Value`.
## Output
`hash_canonical(v)` and `hash_canonical_with(v, opts)` return `[u8; 32]`.
The `_hex` variants return a 64-char lowercase hex `String`.
The canonical bytes are emitted via a private writer; you can't extract
them directly. If you need to debug, parse the output of
`serde_json::to_string` after manually sorting keys.
## License
MIT OR Apache-2.0
Composes with [`agentidemp`](https://crates.io/crates/agentidemp) (idempotency keys)
and [`cachebench`](https://crates.io/crates/cachebench) (prompt-cache
hit-ratio measurement).