mcp-wallfacer
Runtime fuzzing & invariant testing for MCP servers — catch crashes, hangs, schema drift, race conditions, and state leaks before they ship.
mcp-wallfacer is the only runtime testing harness purpose-built for Model Context Protocol servers. It connects over stdio or Streamable HTTP, fuzzes tools with schema-driven adversarial payloads, validates responses against declared output schemas, evaluates user-defined YAML invariants and multi-step sequences, and stress-tests for concurrency races and session-state leaks — then stores every finding as a reproducible JSON record under .wallfacer/corpus/.
It complements static scanners (Snyk Agent Scan, Cisco MCP Scanner, Enkrypt) by exercising observable runtime behaviour instead of inspecting source code or tool descriptions. Run it in CI as a branch-protection gate, or locally before publishing your server.
What it catches
| Finding kind | Trigger |
|---|---|
Crash |
server process dies on a tool call |
Hang |
call exceeds its timeout |
SchemaViolation |
response drifts from declared output schema |
PropertyFailure |
user-declared YAML invariant fails |
ProtocolError |
server returns malformed JSON-RPC |
StateLeak |
session state visible across the wrong boundary |
SequenceFailure |
multi-step invariant breaks (e.g. delete-then-read finds the deleted record) |
A seven-bug demo server is included at examples/python_server/ — running every wallfacer mode against it surfaces every kind above.
Quickstart
# 1. Install (pick any of the five paths — they all serve the same binary)
# 2. Generate a config + sample invariants in your project
# 3. Verify the connection
# 4. Run the security baseline (auth + authorization + path-traversal +
# injection-sql/shell + prompt-injection + secrets-leakage)
Findings stream to stdout (Human / JSON / SARIF) and persist as JSON under .wallfacer/corpus/<tool>/<finding-id>.json with the seed and exact tool call needed for reproduction. Sensitive headers, environment variables, and payload fields (Authorization, Cookie, *-token, password, api_key, ...) are redacted on persistence — see docs/security.md.
Install
Five canonical channels, one binary. Full details in docs/install.md.
| Channel | Command | Best for |
|---|---|---|
| Cargo | cargo install mcp-wallfacer |
Rust toolchain present (MSRV 1.88) |
| GitHub release | download tarball | air-gapped servers, no toolchain |
| npm | npm install -g mcp-wallfacer |
TypeScript / Node MCP authors |
| pip | pip install mcp-wallfacer |
Python MCP authors |
| GitHub Action | uses: lacausecrypto/mcp-wallfacer@v0.8.1 |
CI gating with caching |
The npm and pip wrappers are thin launchers that download the matching prebuilt binary from the GitHub release at install / first-run time; the underlying CLI is byte-identical to a cargo install build of the same version. The crates.io package is mcp-wallfacer; the installed binary is wallfacer.
CI usage
# .github/workflows/wallfacer.yml
name: Wallfacer
on:
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: lacausecrypto/mcp-wallfacer@v0.8.1
with:
pack-all: "true" # or pack: "security\nstateful"
config: wallfacer.toml
format: sarif
- uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: ${{ steps.run.outputs.findings-sarif }}
Commands
| Command | Purpose |
|---|---|
init [--http | --stdio] [--ci] |
scaffold wallfacer.toml + starter invariants.yaml |
doctor |
connect, list tools / resources / prompts (capability-aware: shows n/a when the server doesn't declare a capability) |
fuzz [--coverage-strict] |
adversarial schema-driven inputs; catches Crash / Hang / ProtocolError |
differential [--learn] |
compare runtime responses against declared / learned output schemas |
property <file.yaml> | --pack <name> | --pack-all |
evaluate YAML invariants + multi-step sequences |
torture [--mode parallel|state-leak] |
concurrency + session-boundary stress |
pack {list, show, init, test, params} |
inspect / scaffold / offline-test the embedded rule pack library |
corpus {list, show, replay, minimize} |
inspect, re-run, and shrink stored findings |
replay <id> [--show-payload] |
rerun a finding; substitutes <redacted> payload fields from WALLFACER_REPLAY_<KEY> env vars |
diff <baseline> <candidate> [--fail-on-regression] |
compare two corpus runs; reports new / resolved findings |
ci [--format sarif|json|human] |
short, deterministic boundary pass for branch protection |
Rule packs
17 invariant packs ship embedded in the binary. Discover them with wallfacer pack list; render the auto-generated reference into docs/packs/ with cargo run -p wallfacer-tools -- gen-pack-docs.
When to use which pack
| If your server… | Pack | Catches |
|---|---|---|
| has any user-facing tool | secrets-leakage |
bearer / api-key / secret strings echoed in responses |
| has any user-facing tool | unicode |
RTL override, ZWJ, escape-sequence echoes |
| has any user-facing tool | large-payload |
graceful handling of 10 MB strings / 1M items |
| has any user-facing tool | error-shape |
envelope shape, no stack traces, no internal paths |
| has authentication (whoami / login) | auth |
anonymous rejection, bearer echo, session cookies |
| has RBAC | authorization |
role filtering, escalation, ACL on resources |
| bridges to a filesystem | path-traversal |
../, absolute, UNC, URL-encoded, symlink escapes |
| bridges to a database | injection-sql |
'; DROP, UNION SELECT, comment bypass |
| spawns processes | injection-shell |
;, &&, backticks, $(...) expansion |
| proxies LLM completions | prompt-injection |
"ignore previous", role override, jailbreak markers |
| paginates lists | pagination |
limit honoured, cursor stable, no leak across pages |
declares idempotentHint: true |
idempotency |
envelope stability under repeated calls |
| declares any MCP annotations | tool-annotations |
hints match observable behaviour |
| bridges to a rate-limited API | rate-limit |
quota envelope shape, 429 with Retry-After |
| has create/read/delete tools | stateful |
multi-step state-leak: delete-then-read finds the deleted record |
| has login/logout flow | auth-flow |
multi-step: token revoked after logout |
| wants a security baseline | security |
meta-pack: auth + authorization + path-traversal + injection-* + prompt-injection + secrets-leakage |
# Single pack
# Multiple packs (deduped by canonical invariant name)
# Every embedded pack
# Override a pack's tool-name parameter for your codebase
Persist overrides in wallfacer.toml:
[]
= "getCurrentUser"
= "myListResources"
[]
= "create_record"
= "delete_record"
= "read_record"
Customise a pack: wallfacer pack init <name> copies the embedded YAML to packs/<name>.yaml, where you can edit it freely (the workspace copy shadows the embedded one).
Configuration
[]
= "stdio" # or "http"
= "python3"
= ["server.py"]
= 5000
# HTTP target — ${VAR} is expanded against the process env at load
# time (use $$ to keep a literal $).
# kind = "http"
# url = "http://localhost:8000/mcp"
# [target.headers]
# Authorization = "Bearer ${WALLFACER_BEARER}"
[]
= ".wallfacer/corpus"
= 30000
[]
# Regex allowlist for tools the destructive classifier would
# otherwise refuse to invoke (matched against tool name).
= ["^logs_.*$"]
[]
# Add custom destructive patterns on top of the built-in keyword
# detector (delete / drop / destroy / ...). Set
# `replace_defaults = true` to opt out of the built-ins.
= ["^remove_.*$"]
= false
[]
# Override the default per-kind severity. Useful when concurrency
# races are not security-critical for your tool surface.
= "medium"
Example
examples/python_server/ ships a seven-bug Python MCP server that exercises every FindingKind (Crash, Hang, SchemaViolation, PropertyFailure, ProtocolError, StateLeak, SequenceFailure). The Phase F + L acceptance suite gates CI against this fixture.
&&
A parallel HTTP fixture lives at examples/python_server/server_http.py — same buggy tools served over POST /mcp, used by the Phase M end-to-end test.
Documentation
docs/architecture.md— workspace layout, plan lifecycle, reproducibility contract.docs/security.md— redaction model, file permissions, replay unredaction, threat model.docs/sequences.md— multi-step DSL, substitution rules, reconnect policy.docs/http-target.md— Streamable HTTP transport, env-var headers, fixture.docs/install.md— every install path, with troubleshooting.docs/real-world.md— running packs against external MCP servers, reporting upstream.docs/packs/— auto-generated reference for every embedded pack.- API: https://docs.rs/wallfacer-core.
Roadmap
- v0.2 ✅ — workspace hardening, full JSON Schema generation, plan layer, property DSL v2, robustness pass, DX & docs.
- v0.3 ✅ — embedded rule pack library (15 packs),
for_each_tooldirective, multi-pack composition, real-world validation methodology. - v0.4 ✅ — sequence-aware property testing (
stateful,auth-flowpacks), HTTP transport CI-gated, distribution to npm + pip + GitHub Action Marketplace. - v0.5 ✅ —
wallfacer suggest(auto-detect which packs apply),wallfacer coverage(tool × pack matrix +--strictCI gate),wallfacer report --html(self-contained dashboard). - v0.6 ✅ — stateful fuzzing with persistent corpus + 90/10 mutate-vs-random (
fuzz --corpus-feedback),mcp-spec-conformancepack (validates the MCP wire-format itself),context-poisoningpack (detects malicious servers planting prompt injections),$.tool.{name,description,annotations}DSL extension. - v0.7 ✅ — sequence corpus seeding (cross-pollinates fuzz + sequences), HTTP fault injection fixture (
502 / 504 / FIN-empty / FIN-mid / slow), real input shrinker (corpus minimize --replay, delta-debug), real-world campaign across 6 popular OSS MCPs (clean-bill of health, methodology indocs/real-world-findings.md). - v0.8 ✅ —
wallfacer property --max-tools / --include / --exclude(scales packs to large servers), torture mode confirmed under HTTP faults, per-invariant shrinking (corpus minimize --replay --invariants <path>), flakiness tracker (fuzz --runs N --aggregatetagsstable/flaky/one-shot),prompt-injection-v2pack (50 variants spanning jailbreak / chain-of-thought / multilingual / encoded-payload / formatting-trick attacks). - v0.9 — continued real-world campaign on large MCPs (now unblocked by
--max-tools), grammar DSL for user-defined prompt-injection variants, sequence-aware shrinker (delta-debug across sequence steps).
Contributing
Issues, PRs, and pack contributions welcome. See CONTRIBUTING.md if it exists, otherwise open a discussion on the issues page.
License
Dual-licensed under MIT or Apache-2.0, at your option.