WayDriver
A Rust library for headless GUI application testing on Wayland. Launches apps in isolated compositor sessions, interacts with them via AT-SPI accessibility APIs, and captures screenshots via PipeWire.
The repo also contains waydriver-mcp, a standalone MCP server binary built on top of the library that lets AI assistants drive GTK4 apps directly — see MCP server below.
How it works
Each test session creates an isolated environment with a headless compositor, input injection, and screen capture:
graph TD
subgraph Session["Per-session processes"]
dbus["dbus-daemon (private)"]
dbus --- mutter["Mutter --headless --wayland"]
mutter --- screencast["ScreenCast API (screenshots)"]
mutter --- remotedesktop["RemoteDesktop API (input)"]
dbus --- pipewire["PipeWire (frame capture)"]
dbus --- wireplumber["WirePlumber (PipeWire graph manager)"]
app["Your app (on Mutter's Wayland display)"]
app --- atspi["AT-SPI (accessibility tree, actions)"]
end
The library is backend-agnostic. Three traits define the interface:
CompositorRuntime— lifecycle of a headless compositor (start, stop, expose Wayland display)InputBackend— keyboard and pointer injectionCaptureBackend— screen capture (start/stop PipeWire streams, grab PNG frames)
Concrete implementations are separate crates. The trait-based design allows backends to be added as sibling crates without changing the core.
Backend support
| Feature | Mutter | KWin | Sway |
|---|---|---|---|
| Headless compositor | Yes | — | — |
| Keyboard input | Yes (RemoteDesktop) | — | — |
| Pointer input | Yes (RemoteDesktop) | — | — |
| Screenshots | Yes (ScreenCast + PipeWire) | — | — |
| AT-SPI (UI inspection, clicks) | Yes | — | — |
Currently only Mutter is implemented (waydriver-compositor-mutter, waydriver-input-mutter, waydriver-capture-mutter). Each compositor has its own APIs (Mutter uses org.gnome.Mutter.* D-Bus interfaces, KWin has org.kde.KWin.*, Sway uses wlroots Wayland protocols), so each would need its own set of backend crates.
Crate structure
| Crate | Purpose |
|---|---|
waydriver |
Trait definitions, Session, AT-SPI client, keysym helpers, shared GStreamer capture helper |
waydriver-compositor-mutter |
CompositorRuntime impl — manages Mutter, PipeWire, WirePlumber, private D-Bus |
waydriver-input-mutter |
InputBackend impl — keyboard/pointer via Mutter RemoteDesktop |
waydriver-capture-mutter |
CaptureBackend impl — screenshots via Mutter ScreenCast + PipeWire |
waydriver-mcp |
Binary — MCP JSON-RPC server over stdio that exposes the library to AI assistants |
Usage
use ;
use MutterCompositor;
use MutterInput;
use MutterCapture;
let mut compositor = new;
compositor.start.await?;
let state = compositor.state;
let input = new;
let capture = new;
let session = start.await?;
// Take a screenshot (returns PNG bytes)
let png = session.take_screenshot.await?;
// Interact via AT-SPI
click_element.await?;
session.kill.await?;
MCP server
waydriver-mcp is a standalone binary that exposes the library over the Model Context Protocol, letting AI assistants (Claude Desktop, Claude Code, etc.) drive GTK4 apps in isolated headless sessions. It speaks JSON-RPC over stdio and constructs the Mutter backends internally — clients only see the high-level tools below.
| Tool | Purpose |
|---|---|
start_session |
Spawn a headless Mutter session and launch a command inside it |
list_sessions |
List active session ids, app names, and Wayland displays |
kill_session |
Tear down a session and clean up all child processes |
inspect_ui |
Dump the AT-SPI accessibility tree of the running app |
click_element |
Click a widget by its accessible name (via AT-SPI action) |
type_text |
Type a string into a focused element through the input backend |
press_key |
Press a named key (Return, Tab, Escape, letters, …) |
find_element |
Find a widget by accessible name and return its role and path |
move_pointer |
Move the pointer by a relative offset in logical pixels |
pointer_click |
Press and release a pointer button (defaults to left click) |
take_screenshot |
Capture a PNG via the keepalive ScreenCast stream and return its path |
Why Docker?
waydriver-mcp needs ~8 system services at runtime (mutter, pipewire, wireplumber, dbus, AT-SPI, gstreamer). Installing these manually is fragile and distro-specific. Docker solves four problems:
- Security — the MCP server spawns arbitrary processes, interacts with them via D-Bus, and captures their screen. Running this on your host session gives it access to everything your user can do. Inside a container, it only sees what you explicitly mount — no access to your files, browser sessions, or credentials. Add
--network noneto block network access entirely - Zero-setup distribution —
docker pulland you're running, no system packages to install - D-Bus isolation — each container gets its own dbus-daemon, so apps like gnome-calculator don't interfere across concurrent test sessions (the singleton D-Bus activation problem)
- ABI compatibility — apps built inside the container are guaranteed to link against the same libraries the MCP runtime uses
Running with Docker (recommended)
Prebuilt images are published to GitHub Container Registry for each release:
| Image | Purpose |
|---|---|
ghcr.io/bohdantkachenko/waydriver-mcp |
Runtime — MCP server with all system deps |
ghcr.io/bohdantkachenko/waydriver-mcp-builder |
Build env — Fedora 42 + Rust + gcc/g++ + meson + cmake + GTK4/GLib dev headers |
Use the builder image to compile your app in a Fedora environment that matches the runtime. The resulting binary is ABI-compatible with the runtime image. See Testing your app below for language-specific build examples.
MCP client config (e.g. .mcp.json for Claude Code):
$PWD:/workspace:ro— mounts the project directory so the MCP can launch your app binaries from/workspace//tmp/waydriver:/tmp— makes screenshots accessible on the host at/tmp/waydriver/--network none— the MCP server doesn't need internet access
For NixOS users, also mount the Nix store so Nix-built binaries work inside the container:
Or build from source:
Testing your app with waydriver-mcp
The MCP server is persistent — it stays up for the entire AI assistant session. You rebuild your app independently, and each start_session call picks up the latest binary from the volume. No MCP restart needed between iterations.
Rust apps — build with the builder image, volume-mount the binary:
Then call start_session with command: "/workspace/myapp".
C/C++ apps — the builder image includes gcc, g++, meson, ninja-build, cmake, and GTK4/GLib dev headers:
For extra deps (e.g. libadwaita-devel), extend the builder:
FROM ghcr.io/bohdantkachenko/waydriver-mcp-builder:latest
RUN dnf install -y libadwaita-devel
Node/Python apps — extend the runtime image to add the interpreter, use a named volume for deps:
FROM ghcr.io/bohdantkachenko/waydriver-mcp:latest
RUN dnf install -y nodejs && dnf clean all
Install deps into a named volume (re-run only when lockfile changes):
Mount source + deps — edit source freely, MCP picks up changes on next start_session:
"args":
NixOS users — mount /nix/store so Nix-built binaries just work:
"args":
Running with Nix
For local development without Docker, the Nix app wraps the binary with the required runtime env vars:
Sessions are kept in an in-memory HashMap keyed by id, so multiple apps can run concurrently within one server process.
Requirements
All dependencies are provided by the Nix flake (nix develop). If not using Nix, you need the following system packages.
Build dependencies
| Debian/Ubuntu | Fedora | Arch |
|---|---|---|
pkg-config |
pkg-config |
pkg-config |
libglib2.0-dev |
glib2-devel |
glib2 |
libgstreamer1.0-dev |
gstreamer1-devel |
gstreamer |
libgstreamer-plugins-base1.0-dev |
gstreamer1-plugins-base-devel |
gst-plugins-base |
Runtime dependencies
| Debian/Ubuntu | Fedora | Arch |
|---|---|---|
mutter |
mutter |
mutter |
pipewire |
pipewire |
pipewire |
wireplumber |
wireplumber |
wireplumber |
gstreamer1.0-plugins-base |
gstreamer1-plugins-base |
gst-plugins-base |
gstreamer1.0-plugins-good |
gstreamer1-plugins-good |
gst-plugins-good |
gstreamer1.0-pipewire |
gstreamer1-plugins-pipewire |
gst-plugin-pipewire |
at-spi2-core |
at-spi2-core |
at-spi2-core |
dbus |
dbus |
dbus |
Quick install:
# Debian/Ubuntu
# Fedora
# Arch
Architecture notes
Keepalive ScreenCast stream
In headless mode, Mutter only composites (and delivers Wayland frame callbacks) when a ScreenCast consumer is pulling frames. Without an active stream, GTK4 apps render their first frame but never repaint — the frame clock never ticks.
Session::start opens a persistent ScreenCast stream that stays alive for the session's lifetime. This keeps Mutter compositing continuously so frame callbacks flow and GTK4 apps repaint normally.
Input: RemoteDesktop vs AT-SPI
Two input paths are available, with different trade-offs:
-
RemoteDesktop keyboard/pointer (
press_keysym,pointer_button) — events go through the full Wayland input pipeline (Mutter -> Wayland protocol -> GDK -> GTK event loop). GTK4 processes them normally and repaints. Use this for interactions that need to produce visible changes. -
AT-SPI actions (
click_element) — directly invoke widget signal handlers by accessible name. Accurate and name-based, but they update GTK4's internal model without triggering compositor redraws. Useful for reading the accessibility tree and programmatic activation, but screenshots taken after AT-SPI-only interactions may show stale frames.
App isolation
Apps are launched with GSETTINGS_BACKEND=keyfile and XDG_CONFIG_HOME pointing to the per-session runtime directory. This bypasses the host dconf daemon entirely, so each session starts with default app state and never reads or writes the user's settings.
Dual D-Bus
GTK4's built-in AT-SPI backend only registers on the host session bus — it ignores custom DBUS_SESSION_BUS_ADDRESS. So each session uses two D-Bus connections:
- Host session bus: AT-SPI communication with the app
- Private D-Bus: Mutter's ScreenCast and RemoteDesktop APIs (isolated from the host compositor)
graph LR
subgraph Host
host_dbus["Host session bus"]
end
subgraph Session["Per-session"]
private_dbus["Private D-Bus"]
mutter["Mutter"]
app["Your app"]
waydriver["WayDriver"]
end
waydriver -- "AT-SPI" --> host_dbus
app -- "AT-SPI register" --> host_dbus
waydriver -- "ScreenCast\nRemoteDesktop" --> private_dbus
mutter -- "org.gnome.Mutter.*" --> private_dbus
Screenshot pipeline
graph LR
screencast["Mutter ScreenCast API"]
monitor["RecordMonitor\n(virtual monitor)"]
pipewire["PipeWire stream\n(keepalive)"]
gst["GStreamer pipeline\n(in-process)"]
png["PNG bytes"]
screencast --> monitor --> pipewire --> gst --> png
The keepalive stream doubles as the capture source — take_screenshot reads frames directly from it via the GStreamer Rust bindings (gstreamer + gstreamer-app crates).