llm-message-hash 0.1.0

Stable canonical hash of LLM request/message structures. Recursive key-sorting JSON canonicalization + sha256, with per-provider ignore-lists so semantically-equal Anthropic/OpenAI/Bedrock requests produce the same hash. Useful for cache keys and idempotency.
Documentation
# llm-message-hash

[![Crates.io](https://img.shields.io/crates/v/llm-message-hash.svg)](https://crates.io/crates/llm-message-hash)
[![Documentation](https://docs.rs/llm-message-hash/badge.svg)](https://docs.rs/llm-message-hash)
[![CI](https://github.com/MukundaKatta/llm-message-hash/actions/workflows/ci.yml/badge.svg)](https://github.com/MukundaKatta/llm-message-hash/actions/workflows/ci.yml)
[![License](https://img.shields.io/crates/l/llm-message-hash.svg)](https://crates.io/crates/llm-message-hash)

**Stable canonical hash of LLM request/message structures.**

Two semantically-identical Anthropic requests can produce different
`sha256(serde_json::to_string(&req))` results because JSON key order
isn't guaranteed and fields like `cache_control` change bytes without
changing semantics. This crate walks the value tree, sorts keys
recursively, drops configurable fields, and sha256s the canonical bytes.

Useful for prompt-cache lookups, idempotency keys, and dedupe.

## Install

```toml
[dependencies]
llm-message-hash = "0.1"
serde_json = "1"
```

## Use

Default (no fields dropped):

```rust
use serde_json::json;
use llm_message_hash::hash_canonical_hex;

let a = json!({"model": "claude", "messages": [{"role": "user", "content": "hi"}]});
let b = json!({"messages": [{"content": "hi", "role": "user"}], "model": "claude"});

assert_eq!(hash_canonical_hex(&a), hash_canonical_hex(&b));
```

Per-provider preset (drops cache_control, response-only fields, etc.):

```rust
use llm_message_hash::{hash_canonical_hex_with, HashOpts};

let with_cc = json!({
    "messages": [{
        "role": "user",
        "content": [{"type": "text", "text": "hi", "cache_control": {"type": "ephemeral"}}],
    }],
});
let without_cc = json!({
    "messages": [{
        "role": "user",
        "content": [{"type": "text", "text": "hi"}],
    }],
});

let h1 = hash_canonical_hex_with(&with_cc, &HashOpts::anthropic());
let h2 = hash_canonical_hex_with(&without_cc, &HashOpts::anthropic());
assert_eq!(h1, h2);
```

Built-in presets: `HashOpts::anthropic()`, `HashOpts::openai()`,
`HashOpts::bedrock()`, `HashOpts::gemini()`. Each drops the response-side
metadata that varies per call (e.g. `id`, `created`, `usage`,
`finish_reason`) plus provider-specific request fields that don't change
semantics (e.g. `cache_control` for Anthropic).

Extend any preset fluently:

```rust
let opts = HashOpts::anthropic().ignore("metadata");
```

## What it does NOT do

- No tokenization. Hash is over the *structure*, not the token count.
- No semantic equivalence. `"hi"` and `"Hi"` hash differently. So do
  `42` and `42.0`.
- No streaming hash. Pass a complete `serde_json::Value`.

## Output

`hash_canonical(v)` and `hash_canonical_with(v, opts)` return `[u8; 32]`.
The `_hex` variants return a 64-char lowercase hex `String`.

The canonical bytes are emitted via a private writer; you can't extract
them directly. If you need to debug, parse the output of
`serde_json::to_string` after manually sorting keys.

## License

MIT OR Apache-2.0

Composes with [`agentidemp`](https://crates.io/crates/agentidemp) (idempotency keys)
and [`cachebench`](https://crates.io/crates/cachebench) (prompt-cache
hit-ratio measurement).