# tooltest
**Conformance testing for MCP servers — fast enough for the CLI, solid enough for CI.**
`tooltest` runs your MCP server like a real client/agent would (connect → list tools → call tools) and reports **protocol / schema / runtime** issues in a way that’s easy to act on.
Use it to:
- **sanity-check locally** while you’re developing
- **gate releases in CI** with repeatable integration tests
- **turn failures into fixes** by handing the report to a coding agent
---
## Quick start
### Install
Latest prebuilt binaries (auto-detect OS/arch):
```bash
The installer verifies downloaded binaries against the `*.sha256` checksums when `sha256sum` or `shasum` is available.
Optionally set an install directory (default `/usr/local/bin`, fallback to `~/.local/bin`):
```bash
INSTALL_DIR="$HOME/.local/bin" \
Direct downloads (stable URLs):
- `https://github.com/lambdamechanic/tooltest/releases/download/latest/tooltest-linux-x86_64`
- `https://github.com/lambdamechanic/tooltest/releases/download/latest/tooltest-linux-aarch64`
- `https://github.com/lambdamechanic/tooltest/releases/download/latest/tooltest-macos-arm64`
- `https://github.com/lambdamechanic/tooltest/releases/download/latest/tooltest-windows-x86_64.exe`
Install from crates.io:
```bash
cargo install tooltest
```
### Profiling (debug)
`tooltest-prof` is an optional wrapper that runs the installed `tooltest` binary under `flamegraph`.
It is a debugging tool and is not included in release artifacts or `cargo install` by default.
Install the wrapper via the install script:
```bash
TOOLTEST_INSTALL_DEBUG_TOOLS=1 \
Prerequisites:
- `flamegraph` in your `PATH` (from `cargo install flamegraph`)
- `perf`/DTrace permissions for your platform (see `flamegraph --help`)
Usage (writes SVG output to `TOOLTEST_PROFILE_PATH` when set):
```bash
TOOLTEST_PROFILE_PATH="$PWD/tooltest.svg" \
tooltest-prof stdio --command ./path/to/your-mcp-server
```
For MCP usage, configure your launcher to invoke `tooltest-prof` instead of `tooltest`.
If your flamegraph is mostly "unknown", rebuild tooltest with symbols + frame pointers and
point the wrapper at the new binary:
```bash
./scripts/tooltest-prof-build
TOOLTEST_PROFILE_TOOLTEST_PATH="$PWD/target/release/tooltest" \
TOOLTEST_PROFILE_PATH="$PWD/tooltest.svg" \
tooltest-prof stdio --command ./path/to/your-mcp-server
```
Manual rebuild (if you prefer):
```bash
RUSTFLAGS="-C force-frame-pointers=yes" \
CARGO_PROFILE_RELEASE_DEBUG=1 \
CARGO_PROFILE_RELEASE_STRIP=none \
CARGO_PROFILE_RELEASE_LTO=false \
cargo build -p tooltest --bin tooltest --release
```
### Test a stdio MCP server
```bash
tooltest stdio --command ./path/to/your-mcp-server
# optional: --arg ..., --env KEY=VALUE, --cwd /somewhere
```
### Test a Streamable HTTP MCP endpoint
```bash
tooltest http --url http://127.0.0.1:8080/mcp
# optional: --auth-token "Bearer …"
```
### Output
Human-readable output on stdout by default; pass `--json` for JSON output (including error messages).
### JSON Schema patterns
Tooltest treats JSON Schema `pattern` values as ECMAScript regexes and relies on `rslint_regex` plus
`regex_syntax`/`proptest` to keep generation aligned with ECMA-262 semantics (e.g., ASCII-only `\d`, `\w`).
### Exit codes
- `0` = success
- `1` = run failure
- `2` = argument/validation error
### Use it in CI / tests
Treat tooltest as an integration test: run it against your server build, and fail the job if it reports problems.
Example (shell):
```bash
set -euo pipefail
tooltest stdio --command ./target/release/my-mcp-server
```
---
## CLI examples
### Migration note (legacy generator removal)
Tooltest now runs only the state-machine generator. The `--generator-mode` flag and
`GeneratorMode` API are removed. If you previously relied on legacy random generation,
expect stricter input sourcing by default. To allow schema-based generation for required
fields when the corpus is empty, set `--lenient-sourcing` or provide
`--state-machine-config '{"lenient_sourcing":true}'`.
Simple run against a hosted MCP endpoint:
```bash
cargo run -p tooltest --bin tooltest -- \
--cases 100 \
http --url https://pymcp.app.lambdamechanic.com/kev/mcp
```
Simple run against a local stdio MCP server:
```bash
cargo run -p tooltest --bin tooltest -- \
--cases 100 \
stdio --command ./target/debug/my-mcp-server
```
### State-machine sourcing
State-machine mode is strict by default: it only uses values mined from the corpus when satisfying required schema fields. If your server needs schema-based generation (for example, on the very first call), enable lenient sourcing.
If you see `state-machine generator failed to reach minimum sequence length`, it usually means no eligible tool call could be generated (often because required fields had no corpus values). Re-run with `--lenient-sourcing` or seed values via `--state-machine-config` to unblock generation.
State-machine runs always track a corpus and coverage counts; this adds overhead compared to the legacy generator and can grow with response size.
You can set this in the JSON config:
```bash
--state-machine-config '{"lenient_sourcing":true}'
```
Or override it on the CLI:
```bash
--lenient-sourcing
--no-lenient-sourcing
```
CLI flags take precedence over the JSON config.
### Coverage validation output
When coverage validation fails without a positive error, you can include uncallable tool traces in
the output with `--show-uncallable`. Use `--uncallable-limit <N>` to control how many calls per
tool are included (default: 1).
### Tool filters
Filter eligible tools by name (exact, case-sensitive) using `--tool-allowlist` and
`--tool-blocklist`. These flags only affect invocation generation and are separate from
`coverage_allowlist`/`coverage_blocklist` in the state-machine config, which only affect
coverage warnings and validation.
### In-band tool errors
Tool responses with `isError = true` are allowed by default and do not fail the run. To preserve
the previous behavior, pass `--in-band-error-forbidden`. MCP protocol errors (JSON-RPC errors) and
schema-invalid responses still fail the run.
Static checks (like output schema validation) always apply. If a tool advertises an output schema,
error responses are expected to include `structuredContent` that conforms to that schema. The MCP
spec describes `CallToolResult.structuredContent` as optional and says that if an output schema is
defined it SHOULD conform to the schema; tooltest treats invalid structured content as schema-invalid
and emits a warning when `structuredContent` is missing (even for `isError` results; see
`docs/mcp-spec/2025-11-25/schema.mdx`).
### Seed data
Seed the corpus with known values (strings or numbers) using inline JSON:
```bash
tooltest stdio --command ./target/debug/my-mcp-server \
--state-machine-config '{"seed_strings":["alpha"],"seed_numbers":[42]}'
```
### Text mining
If your MCP server only emits textual content, you can mine whitespace-delimited tokens into the corpus:
```bash
--state-machine-config '{"mine_text":true}'
```
Or override it on the CLI:
```bash
--mine-text
```
### Corpus debugging
Dump the final corpus as JSON (stderr in human mode, inline in --json mode):
```bash
--dump-corpus
```
Log newly mined corpus values after each tool response (stderr):
```bash
--log-corpus-deltas
```
### Pre-run command hook
Run a command after the initial `tools/list` (before tool schema validation) and before each
proptest case to reset external state.
The hook is a shell command string executed via `sh -c`. On Windows, this is unsupported unless a
compatible `sh` is available (for example via Git Bash/MSYS2); patches to add native Windows support
are welcome.
If the hook exits non-zero, the run fails with `code: pre_run_hook_failed` and structured details
(exit code, stdout, stderr, signal). For stdio runs, the hook uses the same `--env` and `--cwd`
settings as the MCP server process.
```bash
--pre-run-hook "./scripts/reset-state.sh"
```
---
## Lint configuration
Tooltest loads lint configuration from `tooltest.toml`. It searches upward from the current working
directory to the git root; if found, that config is used. Otherwise it falls back to
`~/.config/tooltest.toml`, and if no file exists it uses the built-in defaults.
To emit the default config (including comments) run:
```bash
tooltest config default
```
Each lint entry has an `id`, `level` (`error`, `warning`, `disabled`), and optional parameters. For
example, to enable the max-tools lint:
```toml
[[lints]]
id = "max_tools"
level = "error"
[lints.params]
max = 200
```
---
## Tool enumeration (tooltest-core)
The `tooltest-core` crate exposes helper APIs for listing tools with schema validation.
```rust
use tooltest_core::{list_tools_http, HttpConfig, SchemaConfig};
# async fn run() {
let endpoint = HttpConfig::new("http://localhost:3000/mcp").expect("valid http config");
let tools = list_tools_http(
&endpoint,
&SchemaConfig::default(),
)
.await
.expect("list tools");
println!("found {} tools", tools.len());
# }
```
---
## External Rust test example
This is an example integration test in another crate that uses `tooltest-core` to exercise a hosted MCP HTTP endpoint.
```rust
use rmcp::transport::streamable_http_client::StreamableHttpClientTransportConfig;
use rmcp::transport::StreamableHttpClientTransport;
use tooltest_core::{SessionDriver, ToolInvocation};
#[tokio::test]
async fn calls_hosted_tool() {
let transport = StreamableHttpClientTransport::from_config(
StreamableHttpClientTransportConfig::with_uri("https://pymcp.app.lambdamechanic.com/kev/mcp"),
);
let driver = SessionDriver::connect_with_transport(transport)
.await
.expect("connect");
let invocation = ToolInvocation {
name: "some_tool".into(),
arguments: None,
};
let response = driver.call_tool(invocation).await.expect("call tool");
assert_eq!(response.is_error, Some(false));
}
```
---
## Agent-assisted “fix loop” prompt
Paste this into your coding agent (with repo access) and let it iterate until tooltest is clean.
```text
You have access to this repository and can run commands.
Goal: make the repository's MCP server(s) conform to the MCP spec as exercised by tooltest.
Figure out how to start the MCP server from this repo (stdio or streamable HTTP).
Select a small, related subset of tools intended to be used together. Default to testing at most 50 tools at a time, and strongly prefer a smaller group. Use `--tool-allowlist` (or `tool_allowlist` in MCP input) to enforce this.
Run tooltest against it and fix failures until it exits 0.
If you see "state-machine generator failed to reach minimum sequence length", re-run with `--lenient-sourcing` or seed values in `--state-machine-config`.
CLI usage (preferred when you can run commands):
- Use CLI-only flags for debugging, e.g. `--trace-all /tmp/tooltest-traces.jsonl`.
- Examples:
CLI stdio (allowlist example): tooltest stdio --command "<command that starts the repo's MCP server>" --tool-allowlist foo --tool-allowlist bar
CLI http (allowlist example): tooltest http --url "http://127.0.0.1:9000/mcp" --tool-allowlist foo --tool-allowlist bar
MCP tool usage (when you must call via MCP):
- Call the `tooltest` tool with the shared input schema.
- Only fields in the MCP input schema are accepted (CLI-only flags like `--json` and `--trace-all` are not supported).
- Example (allowlist):
{
"target": { "stdio": { "command": "<command that starts the repo's MCP server>" } },
"tool_allowlist": ["foo", "bar"]
}
Don't rename tools or change schemas unless required; prefer backward-compatible fixes.
Add/adjust tests if needed.
Return a short summary of what you changed and why, plus the final passing tooltest output snippet.
```
---
## Tips
- If you want deeper coverage, increase the number of generated cases / run modes (when available).
- If a failure is intermittent, keep the smallest reproduction from the report and turn it into a regression test.
- You can install the tooltest agent skill with `sk install lambdamechanic/tooltest tooltest-fix-loop`.