car-desktop
OS-level screen capture, accessibility inspection, and input synthesis for the Common Agent Runtime.
Sibling of car-browser. Same UiMap output, same
async tool-verb shape, same place in the executor's OPA loop. The
difference is reach: car-desktop sees every window the host OS
renders, not only the pages Chromium serves.
Status
v1 ships with a complete macOS backend (ScreenCaptureKit +
AXUIElement + CGEvent) and stub backends for Windows and Linux that
return CarDesktopError::PlatformUnsupported from every method.
Windows lands in Q2, Linux in Q3.
See docs/CAR_DESKTOP.md in the tokhn
repo
for the full plan, sprint breakdown, and safety model.
Safety
Input synthesis is the highest-consequence primitive in this crate.
Six hard-coded rules apply to every click / type / keypress
request — always active, not configurable to disable:
- Every request carries a
WindowHandle— no absolute-point clicks without a target. - Clicks are clamped to the target window's frame
(
OutOfTargetWindowon miss). - Clicks on buttons whose labels match the destructive-word list
(delete / quit / remove / discard / drop / erase / send /
publish / submit / buy / pay / confirm) require an explicit
unsafe_ok: true(DestructiveActionGatedotherwise). - Per-window rate limit of 8 events/sec via token bucket
(
RateLimited). - Per-request
dry_runflag short-circuits before posting the OS event. - Global Esc-Esc kill switch aborts in-flight missions.
Dependencies
macOS feature pulls in the Apple framework bindings
(objc2-*, core-graphics, core-foundation, accessibility-sys).
Without the feature, MacBackend is not compiled and
default_backend() returns a Linux backend even on macOS targets.
License
Apache-2.0 — same as the rest of CAR.