Use Assay if you already have machine-readable AI outcomes or agent tool-call tests and want a small reviewable artifact boundary in CI.
Start with the path that matches what you already have:
| You have | Use this when | What you get | Next click |
|---|---|---|---|
| Promptfoo JSONL from CI evals | You want smaller PR evidence than a full eval export | Eval outcome receipts, verified bundle, Trust Basis diff | Promptfoo JSONL |
OpenFeature boolean EvaluationDetails |
You want CI evidence for a runtime flag decision boundary | Decision receipt, verified bundle, Trust Basis diff | OpenFeature EvaluationDetails |
| CycloneDX ML-BOM model component | You want CI evidence for the model inventory/provenance boundary that existed | Inventory receipt, verified bundle, Trust Basis diff | CycloneDX ML-BOM |
| MCP tool calls | You are ready to put a policy file around tool execution | Allow/deny audit trail and evidence for observed tool behavior | MCP Quick Start |
| A GitHub PR gate | You want CI to block regressions from checked artifacts | Trust Basis diff, gate status, SARIF/JUnit-ready output | CI Guide |
The core workflow is intentionally small: import or record a bounded outcome, bundle and verify it, compile trust-basis.json, then gate the Trust Basis diff. Assay does not make the upstream tool the source of truth; it makes the evidence boundary inspectable.
Trust Basis Gate
Status: OK
Bundles verified: 1
Regressed claims: 0
Assay is not a trust-score engine, a generic eval dashboard, or a hosted observability product. See What Assay is and is not for the boundary.
Is This For Me?
Yes, if you:
- already have eval output, runtime decisions, inventory artifacts, or MCP tool-call tests
- want a CI review artifact instead of a dashboard-only result
- need bounded auditability, not a scalar trust badge
Not yet, if you:
- need Assay to judge model correctness or policy quality for you
- want a hosted dashboard as the primary product
- want a compliance claim instead of a bounded evidence boundary
Install
CI: GitHub Action. Python SDK: pip install assay-it.
No hosted backend. No API keys for core flows. Deterministic: same input, same decision.
Trust claims use explicit epistemology, not a single “safety score”:
| Level | Meaning |
|---|---|
verified |
Backed by direct evidence or offline verification in the bundle/path |
self_reported |
Emitted by the system without stronger independent corroboration |
inferred |
Derived from bounded, documented rules |
absent |
No trustworthy evidence supports the claim |
Assay does not ship a primary aggregate trust score or a safe/unsafe badge as the main output. See ADR-033.
What ships today
| Output | Role |
|---|---|
| Policy gate | MCP wrap — deterministic allow/deny before tools run (see CLI note below the diagram). |
| Evidence bundle | Offline-verifiable, tamper-evident archive for audit and replay. |
| External receipts | Selected eval outcomes, runtime decision details, and inventory/provenance surfaces as bounded evidence receipts with JSON Schema contracts. |
| Trust Basis | Canonical trust-basis.json — bounded claim classification from verified bundles. |
| Trust Card | trustcard.json / trustcard.md / trustcard.html — same claims, review-friendly artifacts. |
| SARIF / CI | GitHub Action, Security tab integration, policy gates on PRs. |
Repository truth: release notes and CHANGELOG.md remain the authority for what is actually public.
mainmay carry release-prep commits before a tag is cut; crates.io publication is separate from repository merge state.
Agent ──► Assay ──► MCP Server
│
├─ ✅ ALLOW / ❌ DENY (policy)
├─► 📋 Evidence bundle (verifiable)
└─► 📊 Trust Basis → Trust Card → SARIF / CI
CLI: The
mcpcommand group is hidden from top-levelassay --helpwhile the surface stabilizes; it is supported. Useassay mcp --help,assay mcp wrap …, or follow the MCP Quickstart.
Wedge, not category. “MCP firewall” describes the control plane; trust compilation describes the outcome: reviewable claims backed by evidence. See ADR-033 and RFC-005.
See It Work
&&
✅ ALLOW read_file path=/tmp/assay-demo/safe.txt reason=policy_allow
✅ ALLOW list_dir path=/tmp/assay-demo/ reason=policy_allow
❌ DENY read_file path=/tmp/outside-demo.txt reason=path_constraint_violation
❌ DENY exec cmd=ls reason=tool_denied
Inspect the audit artifact:
The bundle is tamper-evident and cryptographically verifiable. Signed mandate events can include an Ed25519-backed authorization trail for high-risk actions.
Trust artifacts from a verified bundle
After a bundle verifies, compile the claim artifact:
# Machine-readable claim basis (deterministic, claim-first)
trust-basis.json is the canonical output for CI and review. Claim id values are stable across runs; consumers should key by id, not row count or order. It is not a scalar trust score.
The current claim-visible receipt families are Promptfoo assertion-component results, OpenFeature boolean EvaluationDetails, and CycloneDX ML-BOM model components. See the receipt-family matrix, the three-family note, and Evidence Receipts in Action.
# -> trust-out/trustcard.json , trust-out/trustcard.md , trust-out/trustcard.html
The Trust Card is a deterministic render of the same claim rows plus frozen non-goals; trustcard.json is canonical, while Markdown and static HTML are reviewer projections. Contract versions, pack floors, and release checklist: MIGRATION — Trust Compiler 3.2, receipt-family matrix. Release history belongs in CHANGELOG.md.
Add to Cursor in 30 Seconds
Assay ships a helper that finds your local Cursor MCP config path and prints a ready-to-paste entry:
It generates JSON like:
The same wrapped command works in other MCP clients — see MCP Quick Start.
Policy Is Simple
version: "2.0"
name: "my-policy"
tools:
allow:
deny:
schemas:
read_file:
type: object
additionalProperties: false
properties:
path:
type: string
pattern: "^/app/.*"
minLength: 1
required:
Legacy constraints: policies still work. Use assay policy migrate for the v2 JSON Schema form, or assay init --from-trace trace.jsonl to generate from observed behavior.
See Policy Files.
OpenTelemetry in, canonical evidence out
Assay ingests OpenTelemetry JSONL, builds replayable traces, and exports canonical evidence — OTel is a bridge, not the sole semantic authority.
Protocol adapters
Assay ships adapters that map protocol events into canonical evidence:
| Protocol | Adapter | What it maps |
|---|---|---|
| ACP (OpenAI/Stripe) | assay-adapter-acp |
Checkout events, payment intents, tool calls |
| A2A (Google) | assay-adapter-a2a |
Agent capabilities, task delegation, artifacts |
| UCP (Google/Shopify) | assay-adapter-ucp |
Discover/buy/post-purchase state transitions |
Adapter crates are workspace / binary-driven, not published as separate crates.io packages.
Add to CI
# .github/workflows/assay.yml
name: Assay Gate
on:
permissions:
contents: read
security-events: write
jobs:
assay:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: Rul1an/assay-action@v2
PRs that violate policy get blocked; SARIF can surface in the Security tab.
Why Assay
| Canonical evidence | Assay’s evidence model is the stable contract; OTel and adapters map into it. |
| Deterministic | Same input, same decision — not probabilistic. |
| Portable artifacts | Bundles, Trust Basis, Trust Card, SARIF — for CI, review, audit. |
| Bounded claims | Explicit about what is verified vs visible vs absent — no score-first UX. |
| MCP-native wedge | assay mcp wrap is the fast path (the mcp group is hidden from assay --help; use assay mcp --help). Adapters extend the same engine. |
| Offline-first | No backend required for core enforcement and bundle verification. |
On the M1 Pro/macOS fragmented-IPI harness, protected tool-decision path:
- Main protection run:
0.771msp50 /1.913msp95 - Fast-path scenario:
0.345msp50 /1.145msp95
These are tool-decision timings, not end-to-end model latency. (See Research & experiments for methodology context.)
Learn More
- Promptfoo JSONL to Evidence Receipts — smallest adoption path for existing eval artifacts
- OpenFeature EvaluationDetails to CI Review Artifact — runtime decision receipt path
- CycloneDX ML-BOM Model to Inventory Receipt — model inventory/provenance receipt path
- MCP Quickstart — filesystem server walkthrough
- Policy Files — YAML schema for
assay mcp wrap - OpenTelemetry & Langfuse — traces → replay and evidence
- CI Guide — GitHub Action
- Evidence Store — S3, B2, MinIO
- ADR-033: Trust compiler positioning
- RFC-005: Trust compiler MVP & Trust Card
Research, mappings & experiments
Bounded context: numbers below support mapping and experiments, not a product “security score.”
- OWASP MCP Top 10 Mapping — how Assay relates to each risk category (coverage is not a scalar guarantee).
- Third-party survey: popular MCP servers often show weak defaults — Assay adds policy + evidence; see discussion in the mapping doc.
- Security experiments — attack vectors and harness notes (methodology matters more than headline counts).
Contributing
See CONTRIBUTING.md. Discussions: GitHub Discussions — seed topics for pinned threads live in docs/community/DISCUSSIONS.md.