assay-runner-spike 3.12.0

Internal/experimental substrate for Assay measured-run workflows. Compatibility wrapper crate that re-exports the Assay-Runner candidate surface from assay-runner-schema (Slice 1) and assay-runner-core (Slice 2). No standalone product guarantee; legacy alias retained for readers of pre-extraction history; semver tracks the Assay workspace.
Documentation

Use Assay if you already have machine-readable AI outcomes or agent tool-call tests and want a small reviewable artifact boundary in CI.

Start with the path that matches what you already have:

You have Use this when What you get Next click
Promptfoo JSONL from CI evals You want smaller PR evidence than a full eval export Eval outcome receipts, verified bundle, Trust Basis diff Promptfoo JSONL
OpenFeature boolean EvaluationDetails You want CI evidence for a runtime flag decision boundary Decision receipt, verified bundle, Trust Basis diff OpenFeature EvaluationDetails
CycloneDX ML-BOM model component You want CI evidence for the model inventory/provenance boundary that existed Inventory receipt, verified bundle, Trust Basis diff CycloneDX ML-BOM
MCP tool calls You are ready to put a policy file around tool execution Allow/deny audit trail and evidence for observed tool behavior MCP Quick Start
A GitHub PR gate You want CI to block regressions from checked artifacts Trust Basis diff, gate status, SARIF/JUnit-ready output CI Guide

The core workflow is intentionally small: import or record a bounded outcome, bundle and verify it, compile trust-basis.json, then gate the Trust Basis diff. Assay does not make the upstream tool the source of truth; it makes the evidence boundary inspectable.

Trust Basis Gate
Status: OK
Bundles verified: 1
Regressed claims: 0

Assay is not a trust-score engine, a generic eval dashboard, or a hosted observability product. See What Assay is and is not for the boundary.

Is This For Me?

Yes, if you:

  • already have eval output, runtime decisions, inventory artifacts, or MCP tool-call tests
  • want a CI review artifact instead of a dashboard-only result
  • need bounded auditability, not a scalar trust badge

Not yet, if you:

  • need Assay to judge model correctness or policy quality for you
  • want a hosted dashboard as the primary product
  • want a compliance claim instead of a bounded evidence boundary

Install

cargo install assay-cli

CI: GitHub Action. Python SDK: pip install assay-it.

No hosted backend. No API keys for core flows. Deterministic: same input, same decision.

Trust claims use explicit epistemology, not a single “safety score”:

Level Meaning
verified Backed by direct evidence or offline verification in the bundle/path
self_reported Emitted by the system without stronger independent corroboration
inferred Derived from bounded, documented rules
absent No trustworthy evidence supports the claim

Assay does not ship a primary aggregate trust score or a safe/unsafe badge as the main output. See ADR-033.

What ships today

Output Role
Policy gate MCP wrap — deterministic allow/deny before tools run (see CLI note below the diagram).
Evidence bundle Offline-verifiable, tamper-evident archive for audit and replay.
External receipts Selected eval outcomes, runtime decision details, and inventory/provenance surfaces as bounded evidence receipts with JSON Schema contracts.
Trust Basis Canonical trust-basis.json — bounded claim classification from verified bundles.
Trust Card trustcard.json / trustcard.md / trustcard.html — same claims, review-friendly artifacts.
SARIF / CI GitHub Action, Security tab integration, policy gates on PRs.

Repository truth: release notes and CHANGELOG.md remain the authority for what is actually public. main may carry release-prep commits before a tag is cut; crates.io publication is separate from repository merge state.

  Agent ──► Assay ──► MCP Server
              │
              ├─ ✅ ALLOW / ❌ DENY  (policy)
              ├─► 📋 Evidence bundle (verifiable)
              └─► 📊 Trust Basis → Trust Card → SARIF / CI

CLI: The mcp command group is hidden from top-level assay --help while the surface stabilizes; it is supported. Use assay mcp --help, assay mcp wrap …, or follow the MCP Quickstart.

Wedge, not category. “MCP firewall” describes the control plane; trust compilation describes the outcome: reviewable claims backed by evidence. See ADR-033 and RFC-005.

See It Work

SafeSkill 72/100

cargo install assay-cli

mkdir -p /tmp/assay-demo && echo "safe content" > /tmp/assay-demo/safe.txt

assay mcp wrap --policy examples/mcp-quickstart/policy.yaml \
  -- npx @modelcontextprotocol/server-filesystem /tmp/assay-demo
✅ ALLOW  read_file  path=/tmp/assay-demo/safe.txt  reason=policy_allow
✅ ALLOW  list_dir   path=/tmp/assay-demo/           reason=policy_allow
❌ DENY   read_file  path=/tmp/outside-demo.txt      reason=path_constraint_violation
❌ DENY   exec       cmd=ls                          reason=tool_denied

Inspect the audit artifact:

assay evidence show demo/fixtures/bundle.tar.gz

Evidence Bundle Inspector

The bundle is tamper-evident and cryptographically verifiable. Signed mandate events can include an Ed25519-backed authorization trail for high-risk actions.

Trust artifacts from a verified bundle

After a bundle verifies, compile the claim artifact:

# Machine-readable claim basis (deterministic, claim-first)
assay trust-basis generate demo/fixtures/bundle.tar.gz > trust-basis.json

trust-basis.json is the canonical output for CI and review. Claim id values are stable across runs; consumers should key by id, not row count or order. It is not a scalar trust score.

The current claim-visible receipt families are Promptfoo assertion-component results, OpenFeature boolean EvaluationDetails, and CycloneDX ML-BOM model components. See the receipt-family matrix, the three-family note, and Evidence Receipts in Action.

assay trustcard generate demo/fixtures/bundle.tar.gz --out-dir ./trust-out
# -> trust-out/trustcard.json , trust-out/trustcard.md , trust-out/trustcard.html

The Trust Card is a deterministic render of the same claim rows plus frozen non-goals; trustcard.json is canonical, while Markdown and static HTML are reviewer projections. Contract versions, pack floors, and release checklist: MIGRATION — Trust Compiler 3.2, receipt-family matrix. Release history belongs in CHANGELOG.md.

Add to Cursor in 30 Seconds

Assay ships a helper that finds your local Cursor MCP config path and prints a ready-to-paste entry:

assay mcp config-path cursor

It generates JSON like:

{
  "filesystem-secure": {
    "command": "assay",
    "args": [
      "mcp",
      "wrap",
      "--policy",
      "/path/to/policy.yaml",
      "--",
      "npx",
      "-y",
      "@modelcontextprotocol/server-filesystem",
      "/Users/you"
    ]
  }
}

The same wrapped command works in other MCP clients — see MCP Quick Start.

Policy Is Simple

version: "2.0"
name: "my-policy"

tools:
  allow: ["read_file", "list_dir"]
  deny: ["exec", "shell", "write_file"]

schemas:
  read_file:
    type: object
    additionalProperties: false
    properties:
      path:
        type: string
        pattern: "^/app/.*"
        minLength: 1
    required: ["path"]

Legacy constraints: policies still work. Use assay policy migrate for the v2 JSON Schema form, or assay init --from-trace trace.jsonl to generate from observed behavior.

See Policy Files.

OpenTelemetry in, canonical evidence out

Assay ingests OpenTelemetry JSONL, builds replayable traces, and exports canonical evidence — OTel is a bridge, not the sole semantic authority.

assay trace ingest-otel \
  --input otel-export.jsonl \
  --db .eval/eval.db \
  --out-trace traces/otel.v2.jsonl

See OpenTelemetry & Langfuse.

Protocol adapters

Assay ships adapters that map protocol events into canonical evidence:

Protocol Adapter What it maps
ACP (OpenAI/Stripe) assay-adapter-acp Checkout events, payment intents, tool calls
A2A (Google) assay-adapter-a2a Agent capabilities, task delegation, artifacts
UCP (Google/Shopify) assay-adapter-ucp Discover/buy/post-purchase state transitions

Adapter crates are workspace / binary-driven, not published as separate crates.io packages.

Add to CI

# .github/workflows/assay.yml
name: Assay Gate
on: [push, pull_request]
permissions:
  contents: read
  security-events: write
jobs:
  assay:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: Rul1an/assay-action@v2

PRs that violate policy get blocked; SARIF can surface in the Security tab.

Why Assay

Canonical evidence Assay’s evidence model is the stable contract; OTel and adapters map into it.
Deterministic Same input, same decision — not probabilistic.
Portable artifacts Bundles, Trust Basis, Trust Card, SARIF — for CI, review, audit.
Bounded claims Explicit about what is verified vs visible vs absent — no score-first UX.
MCP-native wedge assay mcp wrap is the fast path (the mcp group is hidden from assay --help; use assay mcp --help). Adapters extend the same engine.
Offline-first No backend required for core enforcement and bundle verification.

On the M1 Pro/macOS fragmented-IPI harness, protected tool-decision path:

  • Main protection run: 0.771ms p50 / 1.913ms p95
  • Fast-path scenario: 0.345ms p50 / 1.145ms p95

These are tool-decision timings, not end-to-end model latency. (See Research & experiments for methodology context.)

Learn More

Internal: Assay-Runner

Assay-Runner is an internal measured-run subsystem used by Assay's delegated Linux/eBPF acceptance path. It is not a standalone product. As of Phase 2D, the runner candidate is split into extraction-ready Rust crates (assay-runner-schema, assay-runner-core, assay-runner-linux) — all publish = false — plus the runner-fixtures/ package tree (Node fixture marked "private": true; Python fixture has no distribution surface). Everything stays inside this repository.

No release commitment. No timeline. No external demand has been measured.

Research, mappings & experiments

Bounded context: numbers below support mapping and experiments, not a product “security score.”

  • OWASP MCP Top 10 Mapping — how Assay relates to each risk category (coverage is not a scalar guarantee).
  • Third-party survey: popular MCP servers often show weak defaults — Assay adds policy + evidence; see discussion in the mapping doc.
  • Security experiments — attack vectors and harness notes (methodology matters more than headline counts).

Contributing

cargo test --workspace
cargo clippy --workspace --all-targets -- -D warnings

See CONTRIBUTING.md. Discussions: GitHub Discussions — seed topics for pinned threads live in docs/community/DISCUSSIONS.md.

License

MIT