agentsnap 0.1.0

Snapshot tests for AI agent traces. Record once, replay-and-compare on every run; the agent equivalent of Jest snapshots.
Documentation

agentsnap

crates.io docs.rs License: MIT

Snapshot tests for AI agent traces. The Jest-snapshot pattern, applied to LLM agents.

[dev-dependencies]
agentsnap = "0.1"

Why

The agent regression problem: today the agent answers correctly; tomorrow a small change to the system prompt or tool list silently breaks it. Manual eyeballing scales poorly. agentsnap is the smallest possible primitive that catches it: record a trace once, fail loud on any future divergence, refresh with one env var.

Quick start

use agentsnap::{assert_matches, Trace, TraceCall};
use serde_json::json;

#[test]
fn answer_question() {
    let mut trace = Trace::new("answer_question");
    trace.push(TraceCall::llm(
        "planner",
        json!({"q": "what is 2+2?"}),
        json!({"plan": ["compute"]}),
    ));
    trace.push(TraceCall::tool(
        "compute",
        json!({"expr": "2+2"}),
        json!(4),
    ));
    trace.push(TraceCall::llm(
        "responder",
        json!({"result": 4}),
        json!("The answer is 4."),
    ));

    assert_matches(&trace, "tests/snapshots/answer_question.snap.json");
}

First run: writes the snapshot. Subsequent runs: compare. On mismatch, panics with a unified diff:

agentsnap mismatch at tests/snapshots/answer_question.snap.json:
@@ -... @@
-      "output": 4
+      "output": "four"
Run with AGENTSNAP_UPDATE=1 to refresh.

Updating snapshots

Intentional change? Refresh:

AGENTSNAP_UPDATE=1 cargo test

Then commit the updated snapshot files.

API

Trace::new(name)
trace.push(TraceCall::llm(name, input, output))
trace.push(TraceCall::tool(name, input, output))

assert_matches(&trace, path)        // panics on mismatch (test helper)
compare(&trace, path)               // returns MatchResult::{Recorded,Match,Mismatch(diff)}
record(&trace, path)                // unconditionally write the snapshot

Trace and TraceCall derive Serialize/Deserialize — snapshots are plain JSON, easy to grep, easy to review in PRs.

What it doesn't do

  • Doesn't capture traces automatically. You build the Trace from your test (calling whatever your agent code returns).
  • Doesn't tolerate non-determinism. Stub timestamps, IDs, and any clock-dependent output before snapshotting.
  • Doesn't compare semantically — exact JSON equality. For LLM-output-as-ground-truth, layer an LLM-as-judge on top.

Sibling: JS @mukundakatta/agentsnap

JS users: see @mukundakatta/agentsnap on npm.

License

MIT