nils-macos-agent 0.4.2

CLI crate for nils-macos-agent in the nils-cli workspace.
Documentation
# macos-agent

`macos-agent` is a macOS-oriented CLI for agent desktop automation.
It provides parseable primitives for discovery, observation, and input actions:
window/app listing, window activation, click, type, hotkey, AX (Accessibility) actions,
input-source switching, screenshot, and wait helpers.

## Quick Start

```bash
# readiness check
macos-agent preflight --format json

# list targets
macos-agent windows list --format tsv
macos-agent apps list --format json

# activate + input
macos-agent window activate --app Terminal --wait-ms 1500
macos-agent input click --x 200 --y 160
macos-agent input type --text "hello world"
macos-agent input hotkey --mods cmd,shift --key 4
macos-agent input-source switch --id abc

# ax-first interaction
macos-agent ax list --app Arc --role AXButton --title-contains "New"
macos-agent ax click --app Arc --role AXLink --title-contains "YouTube" --nth 1 --allow-coordinate-fallback
macos-agent ax type --app Arc --role AXTextField --title-contains "Search" --text "lofi" --submit --allow-keyboard-fallback
macos-agent ax attr get --app Arc --role AXTextField --name AXValue
macos-agent ax action perform --app Arc --role AXButton --title-contains "Submit" --name AXPress
macos-agent ax session start --app Arc --session-id arc-main
macos-agent ax watch start --session-id arc-main --events AXTitleChanged,AXFocusedUIElementChanged

# observation
macos-agent observe screenshot --active-window --path ./tmp/macos-agent.png

# stabilization waits
macos-agent wait app-active --app Terminal --timeout-ms 1500
macos-agent wait window-present --app Terminal --window-title-contains Inbox --timeout-ms 1500
macos-agent wait ax-present --app Arc --role AXButton --title-contains "New" --timeout-ms 2000
macos-agent wait ax-unique --app Arc --role AXTextField --title-contains "Search" --timeout-ms 2000

# gate + postcondition for mutating AX actions
macos-agent ax click --app Arc --role AXButton --title-contains "Submit" \
  --gate-app-active --gate-window-present --gate-ax-unique \
  --postcondition-focused true --wait-timeout-ms 2000 --wait-poll-ms 75

# selector-frame screenshot
macos-agent observe screenshot --active-window --role AXButton --title-contains "Play" \
  --selector-padding 12 --path ./tmp/macos-agent-selector.png

# diff-aware screenshot publish
macos-agent observe screenshot --active-window --path ./tmp/macos-agent.png \
  --if-changed --if-changed-threshold 2

# one-shot debug bundle
macos-agent debug bundle --active-window --format json
```

## Command Surface

- `preflight`
  - `macos-agent preflight [--strict] [--include-probes]`
  - JSON output includes `result.permissions` with unified fields:
    `screen_recording`, `accessibility`, `automation`, `ready`, `hints`.
- `windows`
  - `macos-agent windows list [--app <name>] [--window-title-contains <name>] [--on-screen-only]`
- `apps`
  - `macos-agent apps list`
- `window`
  - `macos-agent window activate (--window-id <id> | --active-window | --app <name> [--window-title-contains <name>] | --bundle-id <bundle_id>) [--wait-ms <ms>]`
- `input`
  - `macos-agent input click --x <px> --y <px> [--button <left|right|middle>] [--count <n>] [--pre-wait-ms <ms>] [--post-wait-ms <ms>]`
  - `macos-agent input type --text <text> [--delay-ms <ms>] [--submit]`
  - `macos-agent input hotkey --mods <cmd,ctrl,alt,shift,fn> --key <key>`
- `input-source`
  - `macos-agent input-source current`
  - `macos-agent input-source switch --id <source_id|abc|us>`
- `ax`
  - `macos-agent ax list [--session-id <id> | --app <name> | --bundle-id <bundle_id>] [--window-title-contains <text>] [--role <AXRole>] [--title-contains <text>] [--identifier-contains <text>] [--value-contains <text>] [--subrole <AXSubrole>] [--focused <bool>] [--enabled <bool>] [--max-depth <n>] [--limit <n>]`
  - `macos-agent ax click [selector flags...] [target flags...] [--match-strategy <contains|exact|prefix|suffix|regex>] [--selector-explain] [--reselect-before-click] [--allow-coordinate-fallback] [--fallback-order <ax-press,ax-confirm,frame-center,coordinate>] [--gate-app-active] [--gate-window-present] [--gate-ax-present] [--gate-ax-unique] [--wait-timeout-ms <ms>] [--wait-poll-ms <ms>] [--gate-timeout-ms <ms>] [--gate-poll-ms <ms>] [--postcondition-focused <bool>] [--postcondition-attribute <AXAttr>] [--postcondition-attribute-value <value>] [--postcondition-timeout-ms <ms>] [--postcondition-poll-ms <ms>]`
  - `macos-agent ax type [selector flags...] [target flags...] --text <text> [--match-strategy <contains|exact|prefix|suffix|regex>] [--selector-explain] [--clear-first] [--submit] [--paste] [--allow-keyboard-fallback] [--gate-app-active] [--gate-window-present] [--gate-ax-present] [--gate-ax-unique] [--wait-timeout-ms <ms>] [--wait-poll-ms <ms>] [--gate-timeout-ms <ms>] [--gate-poll-ms <ms>] [--postcondition-focused <bool>] [--postcondition-attribute <AXAttr>] [--postcondition-attribute-value <value>] [--postcondition-timeout-ms <ms>] [--postcondition-poll-ms <ms>]`
  - `macos-agent ax attr get [selector flags...] [target flags...] --name <AXAttribute>`
  - `macos-agent ax attr set [selector flags...] [target flags...] --name <AXAttribute> --value <value> [--value-type <string|number|bool|json|null>]`
  - `macos-agent ax action perform [selector flags...] [target flags...] --name <AXAction>`
  - `macos-agent ax session start [--session-id <id>] [--app <name> | --bundle-id <bundle_id>] [--window-title-contains <text>]`
  - `macos-agent ax session list`
  - `macos-agent ax session stop --session-id <id>`
  - `macos-agent ax watch start --session-id <id> [--watch-id <id>] [--events <comma-separated-AX-notifications>] [--max-buffer <n>]`
  - `macos-agent ax watch poll --watch-id <id> [--limit <n>] [--drain|--no-drain]`
  - `macos-agent ax watch stop --watch-id <id>`
- `observe`
  - `macos-agent observe screenshot (--window-id <id> | --active-window | --app <name> [--window-title-contains <name>]) [--path <file>] [--image-format <png|jpg|webp>] [--if-changed] [--if-changed-baseline <path>] [--if-changed-threshold <bits>] [selector flags...] [--selector-padding <px>]`
- `debug`
  - `macos-agent debug bundle [--window-id <id> | --active-window | --app <name> [--window-title-contains <name>]] [--output-dir <path>]`
- `wait`
  - `macos-agent wait sleep --ms <ms>`
  - `macos-agent wait app-active (--app <name> | --bundle-id <bundle_id>) [--timeout-ms <ms>] [--poll-ms <ms>]`
  - `macos-agent wait window-present (--window-id <id> | --active-window | --app <name> [--window-title-contains <name>]) [--timeout-ms <ms>] [--poll-ms <ms>]`
  - `macos-agent wait ax-present [selector flags...] [target flags...] [--timeout-ms <ms>] [--poll-ms <ms>]`
  - `macos-agent wait ax-unique [selector flags...] [target flags...] [--timeout-ms <ms>] [--poll-ms <ms>]`
- `scenario`
  - `macos-agent scenario run --file <scenario.json>`
- `profile`
  - `macos-agent profile validate --file <profile.json>`
  - `macos-agent profile init [--name <profile-name>] [--path <output.json>]`

## Global Flags

- `--format <text|json|tsv>`
- `--error-format <text|json>`
- `--dry-run`
- `--retries <n>`
- `--retry-delay-ms <ms>`
- `--timeout-ms <ms>`
- `--trace`
- `--trace-dir <path>`

Notes:
- `--format tsv` is only supported by `windows list` and `apps list`.
- Canonical flags: use `--window-title-contains` and `input type --submit`.
- Backward-compatible aliases are still accepted: `--window-name`, `input type --enter`.
- `--dry-run` guarantees no OS automation command execution for mutating actions.
- `--error-format json` emits machine-parseable error payloads on `stderr`.
- `--trace` writes per-command trace artifacts to `AGENT_HOME/out/macos-agent-trace/`.
- `--trace-dir` overrides trace artifact output directory.
- When trace mode is enabled, `macos-agent` verifies trace directory writability before running actions.

## Output Contract

- Success:
  - Writes payload to `stdout` only.
  - `stderr` remains empty.
- Error:
  - Writes message to `stderr` only.
  - `stdout` remains empty.
  - Messages start with `error:`.

JSON envelope (`--format json`):

```json
{
  "schema_version": 1,
  "ok": true,
  "command": "input.click",
  "result": {
    "policy": {
      "dry_run": false,
      "retries": 1,
      "retry_delay_ms": 150,
      "timeout_ms": 4000
    },
    "meta": {
      "action_id": "input.click-20260101-000000-7",
      "elapsed_ms": 12
    }
  }
}
```

Preflight permission contract (`macos-agent --format json preflight`):

```json
{
  "result": {
    "permissions": {
      "screen_recording": "unknown",
      "accessibility": "ready",
      "automation": "ready",
      "ready": true,
      "hints": []
    }
  }
}
```

Mutating action commands (`window activate`, `input click`, `input type`, `input hotkey`, `ax click`, `ax type`) always
include `result.policy` in JSON output so agent-side retry and timeout policy can be parsed without
guessing defaults.
These action results also include `result.meta.attempts_used` so flaky steps can be detected quickly.

Exit codes:
- `0`: success
- `1`: runtime failure
- `2`: usage error

Error envelope (`--error-format json`):

```json
{
  "schema_version": 1,
  "ok": false,
  "error": {
    "category": "runtime",
    "operation": "input.click",
    "message": "input.click failed via `cliclick` (exit 2): cliclick failed",
    "hints": [
      "Check macOS Accessibility/Automation permissions if this action controls System Events."
    ]
  }
}
```

## Permission Matrix

| Capability | Required setup | Typical failure symptom | Mitigation |
| --- | --- | --- | --- |
| Accessibility | Terminal host allowed in **System Settings > Privacy & Security > Accessibility** | click/type/hotkey fail | Enable the shell host app (Terminal/iTerm/etc.) and retry |
| Automation (Apple Events) | Terminal host allowed in **System Settings > Privacy & Security > Automation** | activation / System Events probe fails | Allow the terminal app to control System Events |
| Screen Recording | Terminal host allowed in **System Settings > Privacy & Security > Screen Recording** | observe screenshot fails | Enable Screen Recording for terminal host |
| `cliclick` binary | Installed and on `PATH` | preflight reports missing `cliclick` | `brew install cliclick` |

## AX Backend Capability Matrix

| Backend preference | `ax list/click/type` | `ax attr/action/session/watch` | Notes |
| --- | --- | --- | --- |
| `auto` (default) | Hammerspoon first, fallback to AppleScript (JXA) when Hammerspoon is unavailable | Hammerspoon-only | Best default for resilience; fallback does not apply to extended AX commands |
| `hammerspoon` | Supported | Supported | Full AX surface; requires `hs` CLI and `hs.ipc` enabled |
| `applescript` | Supported (JXA) | Not supported directly | Extended AX commands still depend on Hammerspoon runtime |

Preflight now emits an `ax_backend_capabilities` row so operators can verify backend mode and fallback expectations before failures.

## Reliability Boundaries and Practices

Desktop UI automation is inherently brittle due to animation timing, focus drift, and app responsiveness.
Use these defaults for better stability:

- Always activate context before input:
  - `window activate ... --wait-ms 1000`
- Add small waits around click chains:
  - `input click ... --pre-wait-ms 100 --post-wait-ms 100`
- Enable retries for transient failures:
  - `--retries 2 --retry-delay-ms 150`
- Keep timeouts explicit for slow apps:
  - `--timeout-ms 5000`
- Use `wait app-active` / `wait window-present` before mutating actions.
- Prefer `ax click/type` first, then opt in to fallback flags when app AX trees are unstable.
- AX backend selection defaults to `auto` (Hammerspoon first, JXA fallback).
  - Override with `AGENTS_MACOS_AGENT_AX_BACKEND=hammerspoon|applescript|auto`.

## Command Decision Matrix (AX/Input/Wait/Fallback/Backend)

Use this matrix to pick commands consistently. Start from the decision row, then use the mapped troubleshooting row on failure.

| Decision ID | When | Command choice (`ax`/`input`/`wait`) | Fallback policy | Backend policy | Troubleshooting row |
| --- | --- | --- | --- | --- | --- |
| `D1` | Target element is discoverable in AX tree | `ax list` -> `ax click` / `ax type`; gate with `wait app-active` and (if needed) `wait window-present` | Keep fallback flags off first | `auto` default is preferred; see `AX Backend Capability Matrix` | `T3`, `T5` |
| `D2` | AX selector exists but can be unstable across reruns | Same as `D1`, plus `--allow-coordinate-fallback` or `--allow-keyboard-fallback`; keep wait gates explicit | Opt in per command (`ax click/type` only) | Keep `auto` so `ax click/type` can fall back to JXA when Hammerspoon is unavailable | `T4`, `T5` |
| `D3` | AX path is unavailable for the target app | `window activate` + `input click` / `input type` / `input hotkey`; use `wait app-active/window-present` before mutation | No AX fallback path; use coordinate/keyboard input directly | Backend-independent for pure `input` flow | `T1`, `T2` |
| `D4` | Need extended AX operations (`attr`, `action`, `session`, `watch`) | Use `ax attr/action/session/watch` commands; add wait gate before mutating action | No fallback support for extended AX commands | Requires Hammerspoon runtime support (see `AX Backend Capability Matrix`) | `T5` |
| `D5` | Text entry depends on deterministic keyboard layout | `input-source current` -> `input-source switch --id <id>` -> `ax type` or `input type` | Prefer paste/submit flow when IME variance is high | Backend-independent for `input-source`; AX typing still follows `D1`/`D2` backend rules | `T6` |

This AX-first + fallback policy avoids brittle coordinate-only flows while keeping a reliable escape hatch.

## Debug Bundle Triage Flow

Copy-paste triage flow to collect deterministic artifacts after a flaky or failed run:

```bash
OUT="${AGENT_HOME:-$HOME/.agents}/out/macos-agent-debug-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$OUT"

# 1) capture debug bundle + artifact index
macos-agent debug bundle --active-window --output-dir "$OUT" --format json > "$OUT/debug-bundle.json"

# 2) inspect artifact index and partial failures
jq '.result.artifact_index_path, .result.partial_failure, (.result.artifacts[] | {id, ok, path, error})' \
  "$OUT/debug-bundle.json"

# 3) optional selector-frame screenshot for visual targeting proof
macos-agent observe screenshot --active-window --role AXButton --title-contains "Play" \
  --selector-padding 12 --path "$OUT/selector-frame.png" --format json > "$OUT/selector-frame.json"
```

Artifact index notes:
- `result.artifact_index_path` points to the canonical artifact index JSON.
- `result.partial_failure=true` means some artifacts failed but bundle capture still completed.
- Each artifact entry records `id`, `ok`, `path`, and `error` for fast triage routing.

## Deterministic Test Mode

Set `AGENTS_MACOS_AGENT_TEST_MODE=1` to run with deterministic fixtures and without controlling the real desktop.
This mode is used by CI-safe integration tests.

## Opt-in Real macOS E2E Checks

`crates/macos-agent/tests/e2e_real_macos.rs` contains real-desktop checks for:
- TCC signal quality in `preflight` (Accessibility/Automation statuses + hints)
- focus drift detection path for activation + `wait app-active`

`crates/macos-agent/tests/e2e_real_apps.rs` contains app workflow checks for:
- Finder activation + window presence + navigation hotkeys + screenshot evidence
- Arc YouTube flow (open home, click 3 videos, play/pause, comment checkpoint)
- Spotify flow (UI track click, play/pause toggles, player-state probe)
- Cross-app Arc↔Spotify focus recovery and matrix artifact index output

These checks are disabled by default and require explicit opt-in:

```bash
MACOS_AGENT_REAL_E2E=1 cargo test -p nils-macos-agent --test e2e_real_macos
MACOS_AGENT_REAL_E2E=1 MACOS_AGENT_REAL_E2E_MUTATING=1 MACOS_AGENT_REAL_E2E_APP=Finder \
  cargo test -p nils-macos-agent --test e2e_real_macos
MACOS_AGENT_REAL_E2E=1 MACOS_AGENT_REAL_E2E_MUTATING=1 MACOS_AGENT_REAL_E2E_APPS=finder \
  cargo test -p nils-macos-agent --test e2e_real_apps -- finder_navigation_and_state_checks --nocapture
MACOS_AGENT_REAL_E2E=1 MACOS_AGENT_REAL_E2E_MUTATING=1 MACOS_AGENT_REAL_E2E_APPS=arc,spotify,finder \
  MACOS_AGENT_REAL_E2E_PROFILE=default-1440p \
  cargo test -p nils-macos-agent --test e2e_real_apps -- matrix_runner_supports_app_subset_selection_real --nocapture
```

Real-app E2E environment variables:
- `MACOS_AGENT_REAL_E2E=1`: enable real desktop tests.
- `MACOS_AGENT_REAL_E2E_MUTATING=1`: allow mutating desktop actions (click/type/hotkey).
- `MACOS_AGENT_REAL_E2E_APPS=arc,spotify,finder`: select app subset in deterministic order.
  - Unsupported app names are treated as configuration errors (fail fast).
- `MACOS_AGENT_REAL_E2E_PROFILE=default-1440p`: choose coordinate profile fixture.
- `MACOS_AGENT_REAL_E2E_INPUT_SOURCE=com.apple.keylayout.ABC` (or `abc`): optional; if set, tests switch to the target input source once via `im-select` before text-entry flows.
- `MACOS_AGENT_REAL_E2E_STEP_TIMEOUT_MS=15000`: optional per-step timeout guard for real-app helper commands.
- `MACOS_AGENT_REAL_E2E_ITERATIONS=5`: optional short-loop repetition count for matrix runs.

Input-method notes for reliability:
- Arc YouTube navigation uses address-bar focus + clipboard paste + `Return` (not per-key character typing), then verifies the active URL contains `youtube.com` and is not a Google search URL.
- Spotify search input uses clipboard paste (`Cmd+A` + `Cmd+V`) and then `Return`, avoiding IME-dependent character typing.
- If you want deterministic keyboard layout, install `im-select` (`brew install im-select`) and set `MACOS_AGENT_REAL_E2E_INPUT_SOURCE=abc`.
- You can verify/switch layout directly with:
  - `macos-agent --format json input-source current`
  - `macos-agent --format json input-source switch --id abc`

Real-app artifact notes:
- Every real-app scenario writes `steps.jsonl` and `step-summary.json` under its artifact directory.
- `artifact-index.json` includes per-scenario `step_ledger_path`, `failing_step_id`, and `last_successful_step_id`.
- Real-app checks are manual/local validation flows and should not be included in default CI jobs.

## Immediate Feedback Loop

### Workflow 1: readiness then action probe

```bash
macos-agent --format json preflight --include-probes
macos-agent --format json window activate --app Terminal --wait-ms 1200 --retries 1
macos-agent --format json wait app-active --app Terminal --timeout-ms 1500
```

### Workflow 2: machine-parseable failure triage

```bash
macos-agent --error-format json --trace input click --x 200 --y 160
# Read latest trace in AGENT_HOME/out/macos-agent-trace/
```

### Workflow 3: iterate with scenario file + profile checks

```bash
macos-agent profile validate --file crates/macos-agent/tests/fixtures/real_e2e_profile_default_1440p.json
macos-agent --format json scenario run --file crates/macos-agent/tests/fixtures/scenario-basic.json
macos-agent profile init --name local-1440p --path "$AGENT_HOME/out/local-profile.json"
```

## Troubleshooting matrix

Use the `Decision ID` from `Command Decision Matrix` to choose the row quickly.

| ID | Symptom | Next command | What to inspect | Decision row |
| --- | --- | --- | --- | --- |
| `T1` | `not authorized` or Apple Events failures | `macos-agent --format json preflight --include-probes` | `error.hints`, Automation/Accessibility rows | `D3` |
| `T2` | Flaky click/input behavior | `macos-agent --trace --error-format json input click ...` | latest trace JSON (`attempts_used`, timeout/retry policy) | `D3` |
| `T3` | AX selector no match / ambiguous match | `macos-agent --format json ax list --app <name> --role <AXRole> --title-contains <text>` | node candidates (`node_id`, `role`, `title`, `identifier`) and refine selector / `--nth` | `D1` |
| `T4` | AX press/type fails but coordinate/keyboard path should continue | rerun with `ax click --allow-coordinate-fallback` or `ax type --allow-keyboard-fallback` | whether `used_coordinate_fallback` / `used_keyboard_fallback` is true in JSON result | `D2` |
| `T5` | Hammerspoon AX backend unavailable | `hs -t 1 -q -c 'return \"ok\"'` | ensure Hammerspoon is running and `require('hs.ipc')` is enabled, or keep backend `auto` for JXA fallback | `D1`, `D2`, `D4` |
| `T6` | Input source mismatch before typing | `macos-agent --format json input-source current` then `... switch --id abc` | current source id and switch result (`switched=true`) | `D5` |
| `T7` | Trace enabled but command does not start | `macos-agent --trace --trace-dir <path> --error-format json preflight` | `trace.write` error and writable-path hint | `D3` |
| `T8` | Real-app scenario failed mid-flow | run target `e2e_real_apps` command with `--nocapture` | `steps.jsonl`, `step-summary.json`, `artifact-index.json` | `D1`, `D2`, `D3` |
| `T9` | Profile coordinate drift | `macos-agent profile validate --file <profile.json>` | key-path validation errors and bounds issues | `D3` |

## Docs

- [Docs index](docs/README.md)