determs

Replay any automated decision. Prove it to anyone.

Turn any automated decision into a record anyone can verify — with maths and public infrastructure, never by trusting a vendor. Today, that starts with AI agents.

Your agent did something yesterday at 14:23. Today, can you reproduce it? Prove to an auditor what it did, and why? Tell a real regression from noise after a prompt change?

Observability tools (Langfuse, Helicone, Phoenix) show you traces. They don't give you replay or proof. That gap is what determs fills.

┌─ verifiable decision record ────────────────────────────────┐
│  profile        ai.agent.action                              │
│  subject        { model, params, input, output }             │
│  record_digest  44f2b549…c4b581fe4   ← sha-256, reproducible │
│  verify         ✓ trustless — maths + public infra           │
└──────────────────────────────────────────────────────────────┘

The idea

Determs is an open standard — the Verifiable Decision Record — and the deterministic replay & proof engine that implements it.

A log your own system wrote about itself is not evidence — a counterparty, an auditor, or a regulator has no reason to trust it. determs turns each automated decision into a Verifiable Decision Record (VDR): a portable, canonical object whose integrity anyone can check, using only mathematics and (optionally) public infrastructure.

Three primitives:

Capsule — a unit of execution: typed input, deterministic logic, typed output.
Receipt — a stable, hash-anchored proof that a capsule ran on a given input and produced a given output.
Replay — re-running a capsule from a recorded input produces the same output, byte for byte.

"But the LLM is stochastic." Right — and that's not the obstacle people assume. determs records the output that did occur and binds it verifiably to the inputs that produced it. Determinism is a property of the record and its verification, not of the model.

Quickstart

Wrap your LLM client. Every call becomes a record.

pip install "determs[anthropic]"

import anthropic
from determs.anthropic import wrap
from determs.storage import FileStorage

client = wrap(
    anthropic.Anthropic(),
    agent_id="support-triage",
    storage=FileStorage("./records"),
)

resp = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=512,
    messages=[{"role": "user", "content": "My order #1234 hasn't shipped."}],
)
# → ./records/<action_id>.json  (a VDR subject)

Then capture, verify, and replay with the engine (cargo install determs):

determs capture --input ./records/act-xxx.json --output ./record.json
determs verify  --record ./record.json     # 0 if untampered, 1 otherwise
determs replay  --record ./record.json     # rebuilds & compares, bit-exact

verify recomputes every digest from the stored record. Change one character — a content string, a tool argument, a token count — and the digest diverges and verify exits non-zero. No silent corruption.

Verify it yourself

You don't have to trust us — that's the point. Any SHA-256 implementation reproduces a determs digest. Even the project's website is itself a record:

printf '%s' '{"site":"determs.com","standard":"verifiable-decision-record/0","claim":"every automated decision can become a verifiable, replayable record"}' | sha256sum
# 33a0bfac250e303a6765456366dd3e8d19709f4adf5b5b8c4541bcca95f2f8b1

To prove when a record existed — not just its integrity — anchor its digest to public infrastructure: pip install "determs[anchor]" commits it via OpenTimestamps to the Bitcoin blockchain. Only the digest leaves your environment; verification needs no one's permission.

What determs is — and is not

determs is	determs is not
a replay & verifiable-audit layer	a tracing/observability tool
an open record format (VDR)	a prompt evaluator or LLM gateway
a neutral verification primitive	an orchestration framework
trustless by construction	a system you have to trust

Neutral by construction. Tracing tools and agent-governance toolkits keep their own records, in their own systems, attested with their own keys — you verify by trusting them. A VDR is checked against public infrastructure, never a vendor's word, ours included: a signature proves who; a public anchor proves what and when. The format is domain-agnostic and portable, and only the digest is ever published — the decision payload never leaves your environment.

The standard

The record format is an open specification — the Verifiable Decision Record (CC-BY-4.0). It defines the canonical form (RFC 8785 / JCS), the digests, the optional public-log anchoring, and a verification procedure any vendor can implement independently. determs (this repo) is the reference implementation.

Conformance test vectors pin the exact digests a conforming implementation must reproduce — so independent implementations interoperate, not just ours.

A neutral registry index of anchored record_digests is published as a static file — for discovery, never trust: each entry self-verifies against its anchor, and only digests are indexed, never the subject.

An evidence pack bundles anchored records for one system over one period and maps them to a regulation's record-keeping duties (EU AI Act Art. 12 & 19) — a self-verifying artefact an auditor checks against public infrastructure, not against your word or ours. It is the open, trustless primitive of the compliance layer (determs.compliance).

Architecture

  SDK (Python)            Engine (Rust)                 Spec
  ───────────             ─────────────                 ────
  wrap(client)   ──→      capture  → VDR record         docs/spec/
  emits a subject         verify   → recompute digests  (vendor-neutral,
  (the action)            replay   → rebuild & compare    CC-BY-4.0)

src/ — Rust engine + CLI (Apache-2.0)
sdk/python/ — Python SDK: Anthropic & OpenAI wrappers, sync/async, streaming
docs/spec/ — the Verifiable Decision Record specification
examples/ — sample records and runnable demos

Status

Pre-1.0. The engine, the agent.action.replay.v1 capsule, the CLI (capture/verify/replay), and the Python SDK are working today. The VDR spec is at draft v0, with a neutral registry index and an open, self-verifying compliance evidence pack (EU AI Act Art. 12 & 19) alongside it. A managed evidence layer — hosted retention and auditor-ready export over those same packs — is the active commercial build. See ROADMAP.md.

Model & license

Open-core. The specification and the reference implementation (engine, CLI, SDK) are open source:

Code — Apache-2.0
Specification (docs/spec/) — CC-BY-4.0
A managed registry and a compliance layer are the commercial surface.

Contributing & security

CONTRIBUTING.md — best-effort, no CLA (inbound = outbound).
SECURITY.md — report integrity/verification issues privately.

The one invariant: verification must depend on maths and public infrastructure, never on trusting a brand. Contributions that break that are out of scope.

determs 0.1.0