car-desktop 0.7.0

OS-level screen capture, accessibility inspection, and input synthesis for Common Agent Runtime
Documentation

car-desktop

OS-level screen capture, accessibility inspection, and input synthesis for the Common Agent Runtime.

Sibling of car-browser. Same UiMap output, same async tool-verb shape, same place in the executor's OPA loop. The difference is reach: car-desktop sees every window the host OS renders, not only the pages Chromium serves.

Status

v1 ships with a complete macOS backend (ScreenCaptureKit + AXUIElement + CGEvent) and stub backends for Windows and Linux that return CarDesktopError::PlatformUnsupported from every method. Windows lands in Q2, Linux in Q3.

See docs/CAR_DESKTOP.md in the tokhn repo for the full plan, sprint breakdown, and safety model.

Safety

Input synthesis is the highest-consequence primitive in this crate. Six hard-coded rules apply to every click / type / keypress request — always active, not configurable to disable:

  1. Every request carries a WindowHandle — no absolute-point clicks without a target.
  2. Clicks are clamped to the target window's frame (OutOfTargetWindow on miss).
  3. Clicks on buttons whose labels match the destructive-word list (delete / quit / remove / discard / drop / erase / send / publish / submit / buy / pay / confirm) require an explicit unsafe_ok: true (DestructiveActionGated otherwise).
  4. Per-window rate limit of 8 events/sec via token bucket (RateLimited).
  5. Per-request dry_run flag short-circuits before posting the OS event.
  6. Global Esc-Esc kill switch aborts in-flight missions.

Dependencies

macOS feature pulls in the Apple framework bindings (objc2-*, core-graphics, core-foundation, accessibility-sys). Without the feature, MacBackend is not compiled and default_backend() returns a Linux backend even on macOS targets.

License

Apache-2.0 — same as the rest of CAR.