pub enum Action {
Show 17 variants
Screenshot,
CursorPosition,
MouseMove {
x: i32,
y: i32,
},
LeftClick,
RightClick,
MiddleClick,
DoubleClick,
LeftClickDrag {
x: i32,
y: i32,
},
Type {
text: String,
},
Key {
keys: String,
},
Scroll {
x: i32,
y: i32,
direction: ScrollDirection,
amount: i32,
},
Wait {
ms: u64,
},
Exec {
command: String,
},
ExecCapture {
command: String,
timeout_ms: Option<u64>,
},
Navigate {
url: String,
},
SetClipboard {
text: String,
},
GetClipboard,
}Expand description
A single computer-use action sent host → guest. Serialized as the JSON header of a request frame (no payload). Variants mirror the Anthropic computer-use tool so the MCP layer maps 1:1.
Variants§
Screenshot
Capture the framebuffer; response carries a PNG payload. The PNG’s pixel dimensions are the coordinate space every pointer action targets (top-left origin), so a caller can map a downscaled rendering back to true coordinates.
CursorPosition
Report the current pointer position in the response header (x,y).
MouseMove
Absolute pointer move to (x, y). The response header echoes the
resulting pointer position (x,y) — a window manager can constrain
the pointer, so the echo is the ground truth of where it landed.
LeftClick
Left button click at the current pointer position. The response header
echoes the pointer position the click fired at (x,y).
RightClick
Right button click at the current pointer position. The response header
echoes the pointer position the click fired at (x,y).
MiddleClick
Middle button click at the current pointer position. The response header
echoes the pointer position the click fired at (x,y).
DoubleClick
Double left click at the current pointer position. The response header
echoes the pointer position the click fired at (x,y).
LeftClickDrag
Press-move-release: drag from the current position to (x, y). The
response header echoes the resulting pointer position (x,y).
Type
Type a UTF-8 string via synthetic key events.
Key
Press a key chord, e.g. "ctrl+c", "Return", "alt+Tab".
Scroll
Scroll amount clicks in direction at (x, y). The response header
echoes the resulting pointer position (x,y).
Wait
Sleep ms milliseconds guest-side (lets UI settle).
Exec
Launch a shell command in the desktop session (e.g. "chromium &").
ExecCapture
Run command (via /bin/sh -c) to completion synchronously,
returning its combined stdout/stderr as the response payload (UTF-8)
and its exit status in the header’s exit_code (None ⇒ it did not
exit cleanly — killed by the timeout_ms guard or a signal). Stdin is
/dev/null. The in-guest agent is single-threaded, so a long command
blocks every other action until it returns — intended for short,
terminating commands (read a file, run a probe), not GUI apps (use
Action::Exec/Action::Navigate for those). timeout_ms defaults
guest-side and is clamped below the host vsock read timeout.
Open url in the desktop’s browser. The guest hands the URL to a
fixed launcher (vmette-open) without a shell, so the URL is never
word-split or interpreted — a deterministic, injection-safe alternative
to driving the address bar with synthetic keystrokes. Fire-and-forget:
returns a bare ok once the launcher is spawned, not when the page loads
(pair with a settle screenshot to wait for paint).
SetClipboard
Replace the X clipboard (the CLIPBOARD and PRIMARY selections) with
text, so a subsequent paste (Ctrl+V in GUI apps, Shift+Insert /
middle-click in terminals) inserts it. Pairs with Action::Key.
GetClipboard
Read the X CLIPBOARD selection; the text is returned as the response
payload (UTF-8), not in the header — so arbitrary content needs no
JSON escaping. Empty when the clipboard is unset.