assay-sim 3.10.2

Use Assay if you already have machine-readable AI outcomes or agent tool-call tests and want a small reviewable artifact boundary in CI.

Start with the path that matches what you already have:

You have	Use this when	What you get	Next click
Promptfoo JSONL from CI evals	You want smaller PR evidence than a full eval export	Eval outcome receipts, verified bundle, Trust Basis diff	Promptfoo JSONL
OpenFeature boolean `EvaluationDetails`	You want CI evidence for a runtime flag decision boundary	Decision receipt, verified bundle, Trust Basis diff	OpenFeature EvaluationDetails
CycloneDX ML-BOM model component	You want CI evidence for the model inventory/provenance boundary that existed	Inventory receipt, verified bundle, Trust Basis diff	CycloneDX ML-BOM
MCP tool calls	You are ready to put a policy file around tool execution	Allow/deny audit trail and evidence for observed tool behavior	MCP Quick Start
A GitHub PR gate	You want CI to block regressions from checked artifacts	Trust Basis diff, gate status, SARIF/JUnit-ready output	CI Guide

The core workflow is intentionally small: import or record a bounded outcome, bundle and verify it, compile trust-basis.json, then gate the Trust Basis diff. Assay does not make the upstream tool the source of truth; it makes the evidence boundary inspectable.

Trust Basis Gate
Status: OK
Bundles verified: 1
Regressed claims: 0

Assay is not a trust-score engine, a generic eval dashboard, or a hosted observability product. See What Assay is and is not for the boundary.

Is This For Me?

Yes, if you:

already have eval output, runtime decisions, inventory artifacts, or MCP tool-call tests
want a CI review artifact instead of a dashboard-only result
need bounded auditability, not a scalar trust badge

Not yet, if you:

need Assay to judge model correctness or policy quality for you
want a hosted dashboard as the primary product
want a compliance claim instead of a bounded evidence boundary

Install

cargo install assay-cli

CI: GitHub Action. Python SDK: pip install assay-it.

No hosted backend. No API keys for core flows. Deterministic: same input, same decision.

Trust claims use explicit epistemology, not a single “safety score”:

Level	Meaning
`verified`	Backed by direct evidence or offline verification in the bundle/path
`self_reported`	Emitted by the system without stronger independent corroboration
`inferred`	Derived from bounded, documented rules
`absent`	No trustworthy evidence supports the claim

Assay does not ship a primary aggregate trust score or a safe/unsafe badge as the main output. See ADR-033.

What ships today

Output	Role
Policy gate	MCP `wrap` — deterministic allow/deny before tools run (see CLI note below the diagram).
Evidence bundle	Offline-verifiable, tamper-evident archive for audit and replay.
External receipts	Selected eval outcomes, runtime decision details, and inventory/provenance surfaces as bounded evidence receipts with JSON Schema contracts.
Trust Basis	Canonical `trust-basis.json` — bounded claim classification from verified bundles.
Trust Card	`trustcard.json` / `trustcard.md` / `trustcard.html` — same claims, review-friendly artifacts.
SARIF / CI	GitHub Action, Security tab integration, policy gates on PRs.

Repository truth: release notes and CHANGELOG.md remain the authority for what is actually public. main may carry release-prep commits before a tag is cut; crates.io publication is separate from repository merge state.

  Agent ──► Assay ──► MCP Server
              │
              ├─ ✅ ALLOW / ❌ DENY  (policy)
              ├─► 📋 Evidence bundle (verifiable)
              └─► 📊 Trust Basis → Trust Card → SARIF / CI

CLI: The mcp command group is hidden from top-level assay --help while the surface stabilizes; it is supported. Use assay mcp --help, assay mcp wrap …, or follow the MCP Quickstart.

Wedge, not category. “MCP firewall” describes the control plane; trust compilation describes the outcome: reviewable claims backed by evidence. See ADR-033 and RFC-005.

See It Work

cargo install assay-cli

mkdir -p /tmp/assay-demo && echo "safe content" > /tmp/assay-demo/safe.txt

assay mcp wrap --policy examples/mcp-quickstart/policy.yaml \
  -- npx @modelcontextprotocol/server-filesystem /tmp/assay-demo

✅ ALLOW  read_file  path=/tmp/assay-demo/safe.txt  reason=policy_allow
✅ ALLOW  list_dir   path=/tmp/assay-demo/           reason=policy_allow
❌ DENY   read_file  path=/tmp/outside-demo.txt      reason=path_constraint_violation
❌ DENY   exec       cmd=ls                          reason=tool_denied

Inspect the audit artifact:

assay evidence show demo/fixtures/bundle.tar.gz

Evidence Bundle Inspector

The bundle is tamper-evident and cryptographically verifiable. Signed mandate events can include an Ed25519-backed authorization trail for high-risk actions.

Trust artifacts from a verified bundle

After a bundle verifies, compile the claim artifact:

# Machine-readable claim basis (deterministic, claim-first)
assay trust-basis generate demo/fixtures/bundle.tar.gz > trust-basis.json

trust-basis.json is the canonical output for CI and review. Claim id values are stable across runs; consumers should key by id, not row count or order. It is not a scalar trust score.

The current claim-visible receipt families are Promptfoo assertion-component results, OpenFeature boolean EvaluationDetails, and CycloneDX ML-BOM model components. See the receipt-family matrix, the three-family note, and Evidence Receipts in Action.

assay trustcard generate demo/fixtures/bundle.tar.gz --out-dir ./trust-out
# -> trust-out/trustcard.json , trust-out/trustcard.md , trust-out/trustcard.html

The Trust Card is a deterministic render of the same claim rows plus frozen non-goals; trustcard.json is canonical, while Markdown and static HTML are reviewer projections. Contract versions, pack floors, and release checklist: MIGRATION — Trust Compiler 3.2, receipt-family matrix. Release history belongs in CHANGELOG.md.

Add to Cursor in 30 Seconds

Assay ships a helper that finds your local Cursor MCP config path and prints a ready-to-paste entry:

assay mcp config-path cursor

It generates JSON like:

{
  "filesystem-secure": {
    "command": "assay",
    "args": [
      "mcp",
      "wrap",
      "--policy",
      "/path/to/policy.yaml",
      "--",
      "npx",
      "-y",
      "@modelcontextprotocol/server-filesystem",
      "/Users/you"
    ]
  }
}

The same wrapped command works in other MCP clients — see MCP Quick Start.

Policy Is Simple

version: "2.0"
name: "my-policy"

tools:
  allow: ["read_file", "list_dir"]
  deny: ["exec", "shell", "write_file"]

schemas:
  read_file:
    type: object
    additionalProperties: false
    properties:
      path:
        type: string
        pattern: "^/app/.*"
        minLength: 1
    required: ["path"]

Legacy constraints: policies still work. Use assay policy migrate for the v2 JSON Schema form, or assay init --from-trace trace.jsonl to generate from observed behavior.

See Policy Files.

OpenTelemetry in, canonical evidence out

Assay ingests OpenTelemetry JSONL, builds replayable traces, and exports canonical evidence — OTel is a bridge, not the sole semantic authority.

assay trace ingest-otel \
  --input otel-export.jsonl \
  --db .eval/eval.db \
  --out-trace traces/otel.v2.jsonl

See OpenTelemetry & Langfuse.

Protocol adapters

Assay ships adapters that map protocol events into canonical evidence:

Protocol	Adapter	What it maps
ACP (OpenAI/Stripe)	`assay-adapter-acp`	Checkout events, payment intents, tool calls
A2A (Google)	`assay-adapter-a2a`	Agent capabilities, task delegation, artifacts
UCP (Google/Shopify)	`assay-adapter-ucp`	Discover/buy/post-purchase state transitions

Adapter crates are workspace / binary-driven, not published as separate crates.io packages.

Add to CI

# .github/workflows/assay.yml
name: Assay Gate
on: [push, pull_request]
permissions:
  contents: read
  security-events: write
jobs:
  assay:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: Rul1an/assay-action@v2

PRs that violate policy get blocked; SARIF can surface in the Security tab.

Why Assay


Canonical evidence	Assay’s evidence model is the stable contract; OTel and adapters map into it.
Deterministic	Same input, same decision — not probabilistic.
Portable artifacts	Bundles, Trust Basis, Trust Card, SARIF — for CI, review, audit.
Bounded claims	Explicit about what is verified vs visible vs absent — no score-first UX.
MCP-native wedge	`assay mcp wrap` is the fast path (the `mcp` group is hidden from `assay --help`; use `assay mcp --help`). Adapters extend the same engine.
Offline-first	No backend required for core enforcement and bundle verification.

On the M1 Pro/macOS fragmented-IPI harness, protected tool-decision path:

Main protection run: 0.771ms p50 / 1.913ms p95
Fast-path scenario: 0.345ms p50 / 1.145ms p95

These are tool-decision timings, not end-to-end model latency. (See Research & experiments for methodology context.)

Learn More

Promptfoo JSONL to Evidence Receipts — smallest adoption path for existing eval artifacts
OpenFeature EvaluationDetails to CI Review Artifact — runtime decision receipt path
CycloneDX ML-BOM Model to Inventory Receipt — model inventory/provenance receipt path
MCP Quickstart — filesystem server walkthrough
Policy Files — YAML schema for assay mcp wrap
OpenTelemetry & Langfuse — traces → replay and evidence
CI Guide — GitHub Action
Evidence Store — S3, B2, MinIO
ADR-033: Trust compiler positioning
RFC-005: Trust compiler MVP & Trust Card

Research, mappings & experiments

Bounded context: numbers below support mapping and experiments, not a product “security score.”

OWASP MCP Top 10 Mapping — how Assay relates to each risk category (coverage is not a scalar guarantee).
Third-party survey: popular MCP servers often show weak defaults — Assay adds policy + evidence; see discussion in the mapping doc.
Security experiments — attack vectors and harness notes (methodology matters more than headline counts).

Contributing

cargo test --workspace
cargo clippy --workspace --all-targets -- -D warnings

See CONTRIBUTING.md. Discussions: GitHub Discussions — seed topics for pinned threads live in docs/community/DISCUSSIONS.md.

License

MIT