tooltest
Conformance testing for MCP servers — fast enough for the CLI, solid enough for CI.
tooltest runs your MCP server like a real client/agent would (connect → list tools → call tools) and reports protocol / schema / runtime issues in a way that’s easy to act on.
Use it to:
- sanity-check locally while you’re developing
- gate releases in CI with repeatable integration tests
- turn failures into fixes by handing the report to a coding agent
Quick start
Install
Latest prebuilt binaries (auto-detect OS/arch):
|
The installer verifies downloaded binaries against the *.sha256 checksums when sha256sum or shasum is available.
Optionally set an install directory (default /usr/local/bin, fallback to ~/.local/bin):
INSTALL_DIR="/.local/bin" \
|
Direct downloads (stable URLs):
https://github.com/lambdamechanic/tooltest/releases/download/latest/tooltest-linux-x86_64https://github.com/lambdamechanic/tooltest/releases/download/latest/tooltest-linux-aarch64https://github.com/lambdamechanic/tooltest/releases/download/latest/tooltest-macos-arm64https://github.com/lambdamechanic/tooltest/releases/download/latest/tooltest-windows-x86_64.exe
Install from crates.io:
Profiling (debug)
tooltest-prof is an optional wrapper that runs the installed tooltest binary under flamegraph.
It is a debugging tool and is not included in release artifacts or cargo install by default.
Install the wrapper via the install script:
TOOLTEST_INSTALL_DEBUG_TOOLS=1 \
|
Prerequisites:
flamegraphin yourPATH(fromcargo install flamegraph)perf/DTrace permissions for your platform (seeflamegraph --help)
Usage (writes SVG output to TOOLTEST_PROFILE_PATH when set):
TOOLTEST_PROFILE_PATH="/tooltest.svg" \
For MCP usage, configure your launcher to invoke tooltest-prof instead of tooltest.
If your flamegraph is mostly "unknown", rebuild tooltest with symbols + frame pointers and point the wrapper at the new binary:
TOOLTEST_PROFILE_TOOLTEST_PATH="/target/release/tooltest" \
TOOLTEST_PROFILE_PATH="/tooltest.svg" \
Manual rebuild (if you prefer):
RUSTFLAGS="-C force-frame-pointers=yes" \
CARGO_PROFILE_RELEASE_DEBUG=1 \
CARGO_PROFILE_RELEASE_STRIP=none \
CARGO_PROFILE_RELEASE_LTO=false \
Test a stdio MCP server
# optional: --arg ..., --env KEY=VALUE, --cwd /somewhere
Test a Streamable HTTP MCP endpoint
# optional: --auth-token "Bearer …"
Output
Human-readable output on stdout by default; pass --json for JSON output (including error messages).
JSON Schema patterns
Tooltest treats JSON Schema pattern values as ECMAScript regexes and relies on rslint_regex plus
regex_syntax/proptest to keep generation aligned with ECMA-262 semantics (e.g., ASCII-only \d, \w).
Exit codes
0= success1= run failure2= argument/validation error
Use it in CI / tests
Treat tooltest as an integration test: run it against your server build, and fail the job if it reports problems.
Example (shell):
CLI examples
Migration note (legacy generator removal)
Tooltest now runs only the state-machine generator. The --generator-mode flag and
GeneratorMode API are removed. If you previously relied on legacy random generation,
expect stricter input sourcing by default. To allow schema-based generation for required
fields when the corpus is empty, set --lenient-sourcing or provide
--state-machine-config '{"lenient_sourcing":true}'.
Simple run against a hosted MCP endpoint:
Simple run against a local stdio MCP server:
State-machine sourcing
State-machine mode is strict by default: it only uses values mined from the corpus when satisfying required schema fields. If your server needs schema-based generation (for example, on the very first call), enable lenient sourcing.
If you see state-machine generator failed to reach minimum sequence length, it usually means no eligible tool call could be generated (often because required fields had no corpus values). Re-run with --lenient-sourcing or seed values via --state-machine-config to unblock generation.
State-machine runs always track a corpus and coverage counts; this adds overhead compared to the legacy generator and can grow with response size.
You can set this in the JSON config:
Or override it on the CLI:
CLI flags take precedence over the JSON config.
Coverage validation output
When coverage validation fails without a positive error, you can include uncallable tool traces in
the output with --show-uncallable. Use --uncallable-limit <N> to control how many calls per
tool are included (default: 1).
Tool filters
Filter eligible tools by name (exact, case-sensitive) using --tool-allowlist and
--tool-blocklist. These flags only affect invocation generation and are separate from
coverage_allowlist/coverage_blocklist in the state-machine config, which only affect
coverage warnings and validation.
In-band tool errors
Tool responses with isError = true are allowed by default and do not fail the run. To preserve
the previous behavior, pass --in-band-error-forbidden. MCP protocol errors (JSON-RPC errors) and
schema-invalid responses still fail the run.
Static checks (like output schema validation) always apply. If a tool advertises an output schema,
error responses are expected to include structuredContent that conforms to that schema. The MCP
spec describes CallToolResult.structuredContent as optional and says that if an output schema is
defined it SHOULD conform to the schema; tooltest treats invalid structured content as schema-invalid
and emits a warning when structuredContent is missing (even for isError results; see
docs/mcp-spec/2025-11-25/schema.mdx).
Seed data
Seed the corpus with known values (strings or numbers) using inline JSON:
Text mining
If your MCP server only emits textual content, you can mine whitespace-delimited tokens into the corpus:
Or override it on the CLI:
Corpus debugging
Dump the final corpus as JSON (stderr in human mode, inline in --json mode):
Log newly mined corpus values after each tool response (stderr):
Pre-run command hook
Run a command after the initial tools/list (before tool schema validation) and before each
proptest case to reset external state.
The hook is a shell command string executed via sh -c. On Windows, this is unsupported unless a
compatible sh is available (for example via Git Bash/MSYS2); patches to add native Windows support
are welcome.
If the hook exits non-zero, the run fails with code: pre_run_hook_failed and structured details
(exit code, stdout, stderr, signal). For stdio runs, the hook uses the same --env and --cwd
settings as the MCP server process.
Lint configuration
Tooltest loads lint configuration from tooltest.toml. It searches upward from the current working
directory to the git root; if found, that config is used. Otherwise it falls back to
~/.config/tooltest.toml, and if no file exists it uses the built-in defaults.
To emit the default config (including comments) run:
Each lint entry has an id, level (error, warning, disabled), and optional parameters. For
example, to enable the max-tools lint:
[[]]
= "max_tools"
= "error"
[]
= 200
Tool enumeration (tooltest-core)
The tooltest-core crate exposes helper APIs for listing tools with schema validation.
use ;
# async
External Rust test example
This is an example integration test in another crate that uses tooltest-core to exercise a hosted MCP HTTP endpoint.
use StreamableHttpClientTransportConfig;
use StreamableHttpClientTransport;
use ;
async
Agent-assisted “fix loop” prompt
Paste this into your coding agent (with repo access) and let it iterate until tooltest is clean.
You have access to this repository and can run commands.
Goal: make the repository's MCP server(s) conform to the MCP spec as exercised by tooltest.
Figure out how to start the MCP server from this repo (stdio or streamable HTTP).
Select a small, related subset of tools intended to be used together. Default to testing at most 50 tools at a time, and strongly prefer a smaller group. Use `--tool-allowlist` (or `tool_allowlist` in MCP input) to enforce this.
Run tooltest against it and fix failures until it exits 0.
If you see "state-machine generator failed to reach minimum sequence length", re-run with `--lenient-sourcing` or seed values in `--state-machine-config`.
CLI usage (preferred when you can run commands):
- Use CLI-only flags for debugging, e.g. `--trace-all /tmp/tooltest-traces.jsonl`.
- Examples:
CLI stdio (allowlist example): tooltest stdio --command "<command that starts the repo's MCP server>" --tool-allowlist foo --tool-allowlist bar
CLI http (allowlist example): tooltest http --url "http://127.0.0.1:9000/mcp" --tool-allowlist foo --tool-allowlist bar
MCP tool usage (when you must call via MCP):
- Call the `tooltest` tool with the shared input schema.
- Only fields in the MCP input schema are accepted (CLI-only flags like `--json` and `--trace-all` are not supported).
- Example (allowlist):
{
"target": { "stdio": { "command": "<command that starts the repo's MCP server>" } },
"tool_allowlist": ["foo", "bar"]
}
Don't rename tools or change schemas unless required; prefer backward-compatible fixes.
Add/adjust tests if needed.
Return a short summary of what you changed and why, plus the final passing tooltest output snippet.
Tips
- If you want deeper coverage, increase the number of generated cases / run modes (when available).
- If a failure is intermittent, keep the smallest reproduction from the report and turn it into a regression test.
- You can install the tooltest agent skill with
sk install lambdamechanic/tooltest tooltest-fix-loop.