vibe-tests 0.0.1

Integration test framework for MCP servers with LLM-powered tool calling.
Documentation

vibe-tests

Integration test framework for MCP servers with LLM-powered tool calling.

AI models test your MCP tools automatically — they decide which tool to call and how. Isolated Docker environment, structured tracing, and JSON reports out of the box.

Quick Start

1. Add dependency

[dev-dependencies]
vibe-tests = "0.0.1"
# Required by engine_config! macro for pre-test initialization
ctor = "1.0.1"

2. Configure test engine

vibe_tests::engine_config! {
    EngineTests::builder()
        .mcp_host("http://localhost:9021/mcp/v1")
        .ollama_host("http://localhost:11434")
        .ollama_models(&["qwen2.5-coder:3b-instruct"])
        .on_start(|env| async move {
            // Start your MCP server
            std::process::Command::new("my-mcp-server")
                .stdout(env.tee.clone())
                .stderr(env.tee.clone())
                .spawn()?;
            Ok(())
        })
        .on_stop(|env| {
            // Cleanup and save reports
        })
        .build()
        .expect("Failed to build engine")
}

3. Write tests

#[tokio::test]
async fn test_my_tool() {
    // Arrange
    let engine = vibe_tests::engine().await;
    // Act
    let result = engine.test("Show available projects").await;
    // Assert
    assert!(result.success);
    assert!(result.models.iter().all(|m| m.tool.as_deref() == Some("show_projects")));
}

Features

  • Agentic testing — AI models call MCP tools based on natural language queries
  • Isolated environment — Docker compose with automatic up/down, temp directories
  • Structured tracing — file + real-time callback, parseable log format
  • JSON reports — per-query details: model, tool, args, response, duration, error codes
  • Multi-model — test same queries against multiple Ollama models
  • Zero-config defaults — sensible defaults, minimal setup

How it works

  1. engine_config! — one-time setup before all tests
  2. on_start — launch your MCP server (Docker, process, whatever)
  3. engine.test("query") — LLM receives query + available tools → calls one → returns result
  4. on_stop — cleanup, save logs, parse JSON report

Why

Testing MCP servers manually is slow and brittle. Vibe-tests lets AI models test your tools automatically — they understand natural language queries, decide which tool to call, and verify responses.

Catch regressions, validate tool schemas, and ensure your MCP server works before users notice.


Part of Vibe tools for Agentic RAG — Vibe Analyzer.