Assay
Policy-as-Code for AI agents.
Deterministic MCP governance, CI gates, and verifiable evidence bundles. Runs offline-first with no required hosted backend.
Assay validates tool-call behavior against explicit policy, records auditable decisions, and produces replayable evidence. It is built for teams that want hard gates and reviewable artifacts.
Why Assay
- Deterministic gates for MCP-compatible agents in local runs and CI
- Auditable evidence with export, verify, lint, diff, and replay flows
- Runtime control on the tool-call path via
assay mcp wrap - Offline-first workflow with portable outputs
- DX-first CLI with SARIF, JUnit, PR-comment, and markdown outputs
Security Model (Bounded Claims)
Assay’s strongest wedge is deterministic governance on the tool-call route.
In the MCP fragmented-IPI experiment line, stateful sequence policy remained effective across payload fragmentation, tool-hopping, sink-failure pressure, and delayed cross-session sink attempts, where wrap-only lexical checks failed.
Assay does not claim to solve semantic hijacking in general, and it does not claim to block raw outbound network bytes by itself. The bounded claim is narrower: Assay governs sink-call routes with explicit policy decisions, audit-grade evidence, and low single-digit millisecond overhead in the published experiment line.
Results and rerun docs:
- Fragmented IPI results
- Wrap-bypass results
- Second-sink results
- Cross-session decay results
- Sink-failure results
Open Core Boundary
Open core covers the engine, CLI, runtime governance, evidence flows, and baseline packs.
Compliance packs and organization-specific governance packs can be commercial. See ADR-016.
Quickstart
Install
From scratch
# Scaffold config + policy + CI
# Run an offline smoke gate
From an existing trace
# Generate policy from recorded behavior
# Validate trace against config + policy
From an MCP Inspector session
# Import Inspector session to Assay trace format
# Run policy checks
Demo
Core Commands
Testing and validation
| Command | What it does |
|---|---|
assay run |
Execute a test suite against a trace and write run outputs. |
assay ci |
CI-mode run with SARIF, JUnit, and PR-comment outputs. |
assay validate |
Stateless policy validation with text, JSON, or SARIF output. |
assay replay |
Replay from a self-contained offline bundle. |
Policy and config
| Command | What it does |
|---|---|
assay init |
Scaffold policy, config, and CI workflow. |
assay generate |
Generate policy from traces or profiles. |
assay profile |
Multi-run stability profiling. |
assay doctor |
Diagnose config, trace, baseline, and runtime issues. |
assay explain |
Explain policy behavior against a trace. |
Evidence and compliance
| Command | What it does |
|---|---|
assay evidence export |
Create an evidence bundle. |
assay evidence verify |
Verify bundle integrity. |
assay evidence lint |
Lint evidence with optional packs and SARIF output. |
assay evidence diff |
Diff two verified bundles. |
assay evidence push/pull/list |
BYOS object storage flows. |
Runtime
| Command | What it does |
|---|---|
assay mcp wrap |
Wrap an MCP process with policy enforcement. |
assay sandbox |
Rootless Landlock sandbox execution on Linux. |
assay monitor |
eBPF/LSM runtime enforcement on Linux. |
Misc
| Command | What it does |
|---|---|
assay import |
Import traces from Inspector or JSON-RPC logs. |
assay tool sign/verify/keygen |
Local-key tool signing and verification. |
assay fix |
Interactive policy fix suggestions. |
CI Integration
GitHub Actions
name: Assay Gate
on:
permissions:
contents: read
pull-requests: write
security-events: write
jobs:
assay:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@<PINNED_SHA>
- uses: Rul1an/assay-action@v2
Assay Action installs Assay, runs the gate, uploads SARIF, and can publish PR-friendly outputs.
You can also generate a starter workflow:
Or run manually:
Exit codes:
0pass1test failure2config or measurement error3infra error
Configuration
Assay usually works with two files:
eval.yamlfor the test suitepolicy.yamlfor the allowed behavior
eval.yaml:
version: 1
suite: "my_agent"
model: "trace"
tests:
- id: "deploy_args"
input:
prompt: "deploy_staging"
expected:
type: args_valid
schema:
deploy_service:
type: object
required:
properties:
env:
type: string
enum:
policy.yaml:
version: "1.0"
name: "my-policy"
allow:
deny:
- "exec"
- "shell"
- "bash"
constraints:
- tool: "read_file"
params:
path:
matches: "^/app/.*|^/data/.*"
Starter presets:
Evidence Bundles
Assay produces tamper-evident .tar.gz bundles with manifests, hashes, and event streams.
Python Package
The Python package is published as assay-it:
Standards and Related Projects
Assay is easier to evaluate when mapped to established specs and ecosystems:
- Model Context Protocol (MCP)
- OpenTelemetry specification
- CloudEvents specification
- SARIF specification
- JSON Schema specification
These are interoperability references, not claims of full feature parity with each project.
Documentation
- Getting started:
docs/getting-started/quickstart.md - CI guide:
docs/guides/github-action.md - MCP quickstart:
docs/mcp/quickstart.md - Use cases:
docs/use-cases/index.md - Experiment runbooks/results:
docs/ops/ - Architecture index:
docs/architecture/index.md - ADR index:
docs/architecture/adrs.md - Roadmap:
docs/ROADMAP.md - Contributing docs:
docs/contributing/index.md
Contributing
See CONTRIBUTING.md.