agentsnap
Snapshot tests for AI agent traces. The Jest-snapshot pattern, applied to LLM agents.
[]
= "0.1"
Why
The agent regression problem: today the agent answers correctly; tomorrow a small change to the system prompt or tool list silently breaks it. Manual eyeballing scales poorly. agentsnap is the smallest possible primitive that catches it: record a trace once, fail loud on any future divergence, refresh with one env var.
Quick start
use ;
use json;
First run: writes the snapshot. Subsequent runs: compare. On mismatch, panics with a unified diff:
agentsnap mismatch at tests/snapshots/answer_question.snap.json:
@@ -... @@
- "output": 4
+ "output": "four"
Run with AGENTSNAP_UPDATE=1 to refresh.
Updating snapshots
Intentional change? Refresh:
AGENTSNAP_UPDATE=1
Then commit the updated snapshot files.
API
new
trace.push
trace.push
assert_matches // panics on mismatch (test helper)
compare // returns MatchResult::{Recorded,Match,Mismatch(diff)}
record // unconditionally write the snapshot
Trace and TraceCall derive Serialize/Deserialize — snapshots are plain JSON, easy to grep, easy to review in PRs.
What it doesn't do
- Doesn't capture traces automatically. You build the
Tracefrom your test (calling whatever your agent code returns). - Doesn't tolerate non-determinism. Stub timestamps, IDs, and any clock-dependent output before snapshotting.
- Doesn't compare semantically — exact JSON equality. For LLM-output-as-ground-truth, layer an LLM-as-judge on top.
Sibling: JS @mukundakatta/agentsnap
JS users: see @mukundakatta/agentsnap on npm.
License
MIT