mcp-wallfacer

Runtime fuzzing & invariant testing for MCP servers — catch crashes, hangs, schema drift, race conditions, and state leaks before they ship.

mcp-wallfacer is the only runtime testing harness purpose-built for Model Context Protocol servers. It connects over stdio or Streamable HTTP, fuzzes tools with schema-driven adversarial payloads, validates responses against declared output schemas, evaluates user-defined YAML invariants, and stress-tests for concurrency races and session-state leaks — then stores every finding as a reproducible JSON record under .wallfacer/corpus/.

It complements static scanners (Snyk Agent Scan, Cisco MCP Scanner, Enkrypt) by exercising observable runtime behavior instead of inspecting source code or tool descriptions. Run it in CI as a branch-protection gate, or locally before publishing your server.

What it catches

Crash — server process dies on a tool call.
Hang — call exceeds its timeout.
SchemaViolation — response drifts from declared output schema.
PropertyFailure — user-declared YAML invariant fails.
ProtocolError — server returns malformed JSON-RPC.
StateLeak — session state visible across the wrong boundary.

A six-bug demo server is included at examples/python_server/ — running the four wallfacer modes against it surfaces every kind above.

Install

Requires Rust 1.88 or newer. The original 1.83 target is not compatible with the current official rmcp SDK, which uses Rust features stabilized after 1.83.

cargo install mcp-wallfacer

The crates.io package is mcp-wallfacer; the installed binary is wallfacer.

Quickstart

wallfacer init                       # writes wallfacer.toml + invariants.yaml
wallfacer doctor                     # lists tools/resources/prompts
wallfacer fuzz --seed 42 --iterations 200
wallfacer differential --learn       # snapshot declared output schemas
wallfacer differential               # check responses against the snapshot
wallfacer property invariants.yaml   # YAML invariants
wallfacer property --pack auth       # built-in rule pack (Phase F4)
wallfacer torture --concurrency 100
wallfacer ci --format sarif > wallfacer.sarif

wallfacer corpus list                # browse stored findings
wallfacer replay <finding-id>        # rerun a finding (env-var unredact)
wallfacer diff baseline/ candidate/  # regressions vs fixes between two runs

Findings are serialized as JSON under .wallfacer/corpus/<tool>/<finding_id>.json with the seed and the exact tool call needed for reproduction. Sensitive headers, environment variables, and payload fields (Authorization, Cookie, *-token, password, api_key, ...) are redacted on persistence — see docs/security.md.

Commands

init [--http|--stdio] [--ci] [--skip-invariants]: write wallfacer.toml + a starter invariants.yaml, optionally with the GitHub Actions workflow.
doctor: connect and list tools, resources, and prompts.
fuzz [--coverage-strict] [--include glob] [--exclude glob]: generate adversarial tool inputs and detect crashes, hangs, and protocol errors. Honors globset patterns (**/foo, tools.{a,b}).
differential [--learn]: compare runtime responses with declared or learned output schemas.
property <file.yaml> | --pack <name>: evaluate YAML invariants over generated or fixed cases. Built-in packs: auth, path-traversal, error-shape.
torture [--mode parallel|state-leak] [--concurrency N] [--duration <span>]: concurrency and state-boundary checks under a global cancellation deadline.
corpus {list, show <id>, replay <id>, minimize <id>}: inspect, replay, and minimize stored findings.
replay <id> [--show-payload]: rerun a stored finding, substituting <redacted> payload fields from WALLFACER_REPLAY_<KEY> env vars locally (never logged).
diff <baseline> <candidate> [--fail-on-regression]: compare two corpus directories; reports new findings (regressions) and resolved ones (fixes).
ci [--format sarif|json|human] [--severity-threshold low|medium|high|critical]: short, deterministic boundary-payload pass; emits SARIF for branch protection.

Configuration

[target]
kind = "stdio"
command = "python3"
args = ["server.py"]
timeout_ms = 5000

[output]
corpus_dir = ".wallfacer/corpus"
lock_timeout_ms = 30000   # Phase E3, default 30s

[allow_destructive]
# Regex patterns matched against tool name; matching tools bypass the
# destructive classifier. Phase C5.
tools = ["^logs_.*$"]

[destructive]
# Replace the default keyword detector (delete/drop/destroy/...) with
# custom regexes. Empty = use defaults. Phase C5.
patterns = []

HTTP targets use:

[target]
kind = "http"
url = "http://localhost:8000/mcp"
headers = { Authorization = "Bearer xxx" }

Example

examples/python_server/ ships a six-bug Python MCP server that exercises every FindingKind (Crash, Hang, SchemaViolation, PropertyFailure, ProtocolError, StateLeak). It is also the Phase F acceptance fixture for the e2e suite.

cd examples/python_server
wallfacer fuzz
wallfacer differential --learn && wallfacer differential
wallfacer property invariants.yaml
wallfacer torture --mode state-leak
wallfacer corpus list

Rule packs

15 invariant packs ship embedded in the binary. Discover them with wallfacer pack list; render the human reference with cargo run -p wallfacer-tools -- gen-pack-docs (output under docs/packs/).

When to use which pack

If your server…	Pack	Catches
has any user-facing tool	`secrets-leakage`	bearer/api-key/secret strings echoed in responses
has any user-facing tool	`unicode`	RTL override, ZWJ, escape-sequence echoes
has any user-facing tool	`large-payload`	graceful handling of 10 MB strings / 1M items
has any user-facing tool	`error-shape`	envelope shape, no stack traces, no internal paths
has authentication (whoami/login)	`auth`	anonymous rejection, bearer echo, session cookies
has RBAC	`authorization`	role filtering, escalation, ACL on resources
bridges to a filesystem	`path-traversal`	`../`, absolute, UNC, URL-encoded, symlink escapes
bridges to a database	`injection-sql`	`'; DROP`, UNION SELECT, comment bypass
spawns processes	`injection-shell`	`;`, `&&`, backticks, `$(...)` expansion
proxies LLM completions	`prompt-injection`	"ignore previous", role override, jailbreak markers
paginates lists	`pagination`	limit honored, cursor stable, no leak across pages
declares `idempotentHint: true`	`idempotency`	envelope stability under repeated calls
declares any MCP annotations	`tool-annotations`	hints match observable behaviour
bridges to a rate-limited API	`rate-limit`	quota envelope shape, 429 with Retry-After
wants a security baseline	`security`	meta-pack: auth + authorization + path-traversal + injection-* + prompt-injection + secrets-leakage

# Run a single pack against your server (after `wallfacer init`):
wallfacer property --pack secrets-leakage

# Compose multiple:
wallfacer property --pack auth --pack error-shape

# Run every embedded pack:
wallfacer property --pack-all

# Override a pack's tool-name parameter for your codebase:
wallfacer property --pack auth --param whoami_tool=getCurrentUser

Override patterns persist in wallfacer.toml:

[packs.auth]
whoami_tool = "getCurrentUser"
list_resources_tool = "myListResources"

Customise a pack: wallfacer pack init <name> copies the embedded YAML to packs/<name>.yaml, where you can edit it freely (workspace copy shadows the embedded one).

Documentation

docs/architecture.md — workspace layout, plan lifecycle, reproducibility contract.
docs/security.md — redaction model, file permissions, replay unredaction, threat model.
docs/real-world.md — running packs against external MCP servers, reporting upstream.
docs/packs/ — auto-generated reference for every embedded pack.
API: https://docs.rs/wallfacer-core.

Roadmap

v0.2: Phases A–F — workspace hardening, full JSON Schema generation, plan layer, property DSL v2, robustness pass, DX & docs. ✅ shipped.
v0.3: rule packs for common MCP security and reliability issues; reusable invariant libraries. in progress (Phases G–K).
v0.4: shared corpus workflows and reporting; remote pack registries.

wallfacer-core 0.4.0