wallfacer-core 0.4.0

Runtime fuzzing and invariant-testing harness for MCP servers — catch crashes, hangs, schema drift, and state leaks before they ship.
Documentation

mcp-wallfacer

Runtime fuzzing & invariant testing for MCP servers — catch crashes, hangs, schema drift, race conditions, and state leaks before they ship.

Crates.io Downloads Docs.rs CI MSRV License Stars

mcp-wallfacer is the only runtime testing harness purpose-built for Model Context Protocol servers. It connects over stdio or Streamable HTTP, fuzzes tools with schema-driven adversarial payloads, validates responses against declared output schemas, evaluates user-defined YAML invariants, and stress-tests for concurrency races and session-state leaks — then stores every finding as a reproducible JSON record under .wallfacer/corpus/.

It complements static scanners (Snyk Agent Scan, Cisco MCP Scanner, Enkrypt) by exercising observable runtime behavior instead of inspecting source code or tool descriptions. Run it in CI as a branch-protection gate, or locally before publishing your server.

What it catches

  • Crash — server process dies on a tool call.
  • Hang — call exceeds its timeout.
  • SchemaViolation — response drifts from declared output schema.
  • PropertyFailure — user-declared YAML invariant fails.
  • ProtocolError — server returns malformed JSON-RPC.
  • StateLeak — session state visible across the wrong boundary.

A six-bug demo server is included at examples/python_server/ — running the four wallfacer modes against it surfaces every kind above.

Install

Requires Rust 1.88 or newer. The original 1.83 target is not compatible with the current official rmcp SDK, which uses Rust features stabilized after 1.83.

cargo install mcp-wallfacer

The crates.io package is mcp-wallfacer; the installed binary is wallfacer.

Quickstart

wallfacer init                       # writes wallfacer.toml + invariants.yaml
wallfacer doctor                     # lists tools/resources/prompts
wallfacer fuzz --seed 42 --iterations 200
wallfacer differential --learn       # snapshot declared output schemas
wallfacer differential               # check responses against the snapshot
wallfacer property invariants.yaml   # YAML invariants
wallfacer property --pack auth       # built-in rule pack (Phase F4)
wallfacer torture --concurrency 100
wallfacer ci --format sarif > wallfacer.sarif

wallfacer corpus list                # browse stored findings
wallfacer replay <finding-id>        # rerun a finding (env-var unredact)
wallfacer diff baseline/ candidate/  # regressions vs fixes between two runs

Findings are serialized as JSON under .wallfacer/corpus/<tool>/<finding_id>.json with the seed and the exact tool call needed for reproduction. Sensitive headers, environment variables, and payload fields (Authorization, Cookie, *-token, password, api_key, ...) are redacted on persistence — see docs/security.md.

Commands

  • init [--http|--stdio] [--ci] [--skip-invariants]: write wallfacer.toml + a starter invariants.yaml, optionally with the GitHub Actions workflow.
  • doctor: connect and list tools, resources, and prompts.
  • fuzz [--coverage-strict] [--include glob] [--exclude glob]: generate adversarial tool inputs and detect crashes, hangs, and protocol errors. Honors globset patterns (**/foo, tools.{a,b}).
  • differential [--learn]: compare runtime responses with declared or learned output schemas.
  • property <file.yaml> | --pack <name>: evaluate YAML invariants over generated or fixed cases. Built-in packs: auth, path-traversal, error-shape.
  • torture [--mode parallel|state-leak] [--concurrency N] [--duration <span>]: concurrency and state-boundary checks under a global cancellation deadline.
  • corpus {list, show <id>, replay <id>, minimize <id>}: inspect, replay, and minimize stored findings.
  • replay <id> [--show-payload]: rerun a stored finding, substituting <redacted> payload fields from WALLFACER_REPLAY_<KEY> env vars locally (never logged).
  • diff <baseline> <candidate> [--fail-on-regression]: compare two corpus directories; reports new findings (regressions) and resolved ones (fixes).
  • ci [--format sarif|json|human] [--severity-threshold low|medium|high|critical]: short, deterministic boundary-payload pass; emits SARIF for branch protection.

Configuration

[target]
kind = "stdio"
command = "python3"
args = ["server.py"]
timeout_ms = 5000

[output]
corpus_dir = ".wallfacer/corpus"
lock_timeout_ms = 30000   # Phase E3, default 30s

[allow_destructive]
# Regex patterns matched against tool name; matching tools bypass the
# destructive classifier. Phase C5.
tools = ["^logs_.*$"]

[destructive]
# Replace the default keyword detector (delete/drop/destroy/...) with
# custom regexes. Empty = use defaults. Phase C5.
patterns = []

HTTP targets use:

[target]
kind = "http"
url = "http://localhost:8000/mcp"
headers = { Authorization = "Bearer xxx" }

Example

examples/python_server/ ships a six-bug Python MCP server that exercises every FindingKind (Crash, Hang, SchemaViolation, PropertyFailure, ProtocolError, StateLeak). It is also the Phase F acceptance fixture for the e2e suite.

cd examples/python_server
wallfacer fuzz
wallfacer differential --learn && wallfacer differential
wallfacer property invariants.yaml
wallfacer torture --mode state-leak
wallfacer corpus list

Rule packs

15 invariant packs ship embedded in the binary. Discover them with wallfacer pack list; render the human reference with cargo run -p wallfacer-tools -- gen-pack-docs (output under docs/packs/).

When to use which pack

If your server… Pack Catches
has any user-facing tool secrets-leakage bearer/api-key/secret strings echoed in responses
has any user-facing tool unicode RTL override, ZWJ, escape-sequence echoes
has any user-facing tool large-payload graceful handling of 10 MB strings / 1M items
has any user-facing tool error-shape envelope shape, no stack traces, no internal paths
has authentication (whoami/login) auth anonymous rejection, bearer echo, session cookies
has RBAC authorization role filtering, escalation, ACL on resources
bridges to a filesystem path-traversal ../, absolute, UNC, URL-encoded, symlink escapes
bridges to a database injection-sql '; DROP, UNION SELECT, comment bypass
spawns processes injection-shell ;, &&, backticks, $(...) expansion
proxies LLM completions prompt-injection "ignore previous", role override, jailbreak markers
paginates lists pagination limit honored, cursor stable, no leak across pages
declares idempotentHint: true idempotency envelope stability under repeated calls
declares any MCP annotations tool-annotations hints match observable behaviour
bridges to a rate-limited API rate-limit quota envelope shape, 429 with Retry-After
wants a security baseline security meta-pack: auth + authorization + path-traversal + injection-* + prompt-injection + secrets-leakage
# Run a single pack against your server (after `wallfacer init`):
wallfacer property --pack secrets-leakage

# Compose multiple:
wallfacer property --pack auth --pack error-shape

# Run every embedded pack:
wallfacer property --pack-all

# Override a pack's tool-name parameter for your codebase:
wallfacer property --pack auth --param whoami_tool=getCurrentUser

Override patterns persist in wallfacer.toml:

[packs.auth]
whoami_tool = "getCurrentUser"
list_resources_tool = "myListResources"

Customise a pack: wallfacer pack init <name> copies the embedded YAML to packs/<name>.yaml, where you can edit it freely (workspace copy shadows the embedded one).

Documentation

Roadmap

  • v0.2: Phases A–F — workspace hardening, full JSON Schema generation, plan layer, property DSL v2, robustness pass, DX & docs. ✅ shipped.
  • v0.3: rule packs for common MCP security and reliability issues; reusable invariant libraries. in progress (Phases G–K).
  • v0.4: shared corpus workflows and reporting; remote pack registries.