ferrisgrid-capture 0.2.0

Screen capture backends and screenshot processing for FerrisGrid.
Documentation

FerrisGrid captures the current screen, maps it to deterministic coordinates, returns compact Markdown to an agent, executes one constrained action, captures the result, and exits. The agent does the reasoning. FerrisGrid handles the screen, coordinates, input, and local trace.

┌──────────────┐    observe     ┌──────────────┐
│ Agent / LLM  │ ─────────────> │ FerrisGrid   │
│              │ <───────────── │ screenshot + │
│ choose one   │   Markdown     │ coordinates  │
│ action       │                └──────────────┘
│              │      act       ┌──────────────┐
│              │ ─────────────> │ validate +   │
│              │ <───────────── │ execute one  │
└──────────────┘   screenshot   └──────────────┘

Why FerrisGrid?

  • Eyes plus a map: screenshots become coordinate-backed observations an LLM can reason over.
  • Single-step by default: every call performs one observation or one action.
  • Deterministic coordinates: screenshots map cleanly back to native screen pixels.
  • Local-first traces: screenshots, metadata, action requests, and results stay under .ferrisgrid/.
  • Cross-platform shape: the agent-facing protocol is the same across macOS, Linux, and Windows where the platform allows it.
  • Container-friendly: run a Linux desktop workspace in Docker and watch it through noVNC while the agent works away from your main screen.

Installation

For normal use, install the published CLI with Cargo:

cargo install ferrisgrid-cli
ferrisgrid doctor

The package is ferrisgrid-cli on crates.io and installs the ferrisgrid command.

Quick Start

Capture the current screen:

ferrisgrid observe

Run one action from a Markdown action file:

mkdir -p .ferrisgrid
cat > .ferrisgrid/action.md <<'EOF'
status: action
action: click
screen_id: screen-1
x: 500
y: 500
button: left
wait_after_ms: 500
EOF

ferrisgrid act --file .ferrisgrid/action.md

Development from source

Use a local checkout when you want to build, test, or modify FerrisGrid:

git clone https://github.com/BrunoV21/FerrisGrid-CLI.git
cd FerrisGrid-CLI
cargo build
cargo test --workspace
cargo run -q -p ferrisgrid-cli -- doctor
cargo run -q -p ferrisgrid-cli -- observe

Run one source-built action from a Markdown action file:

mkdir -p .ferrisgrid
cat > .ferrisgrid/action.md <<'EOF'
status: action
action: click
screen_id: screen-1
x: 500
y: 500
button: left
wait_after_ms: 500
EOF

cargo run -q -p ferrisgrid-cli -- act --file .ferrisgrid/action.md

Agent Skills

If you are an agent, use this script to download and install the FerrisGrid skills:

curl -fsSL https://raw.githubusercontent.com/BrunoV21/FerrisGrid-CLI/main/scripts/install-ferrisgrid-skills.sh | sh

Run it from the directory that should receive the skill folders. The script downloads the repository zip, extracts the contents of .agents/skills, and installs those skill directories into the current directory.

Docker Workspace

FerrisGrid can run inside a Linux container with its own X11 display. The agent calls FerrisGrid with docker exec; input happens inside the container, not on your main desktop.

The Docker image installs the published ferrisgrid-cli package from crates.io. By default it installs the latest published version; release builds pass the tagged version explicitly.

docker build -f docker/linux-workspace.Dockerfile -t ferrisgrid-linux-workspace .

Build a specific published version:

docker build \
  --build-arg FERRISGRID_VERSION=0.2.0 \
  -f docker/linux-workspace.Dockerfile \
  -t ferrisgrid-linux-workspace .

Run the workspace:

docker run --rm -d \
  --name ferrisgrid-workspace \
  -p 6080:6080 \
  -v "$PWD:/workspace" \
  ferrisgrid-linux-workspace

Open the viewer:

http://127.0.0.1:6080/vnc.html?autoconnect=1&resize=scale

Then run:

docker exec ferrisgrid-workspace ferrisgrid doctor
docker exec ferrisgrid-workspace ferrisgrid observe

Documentation

Official docs live in docs/official. Brand positioning and story notes live in docs/branding.

cd docs/official
npm install
npm run docs:dev

The docs use a terminal-brutalist Terminal Violet palette: black surfaces, violet brand/action states, cyan coordinate accents, and status colors for execution feedback.

Community and feedback

Project Status

FerrisGrid is early, local-first infrastructure for agent-facing visual control. The current focus is reliable observe/act behavior, local traces, recap output, and containerized Linux workspaces.

License

MIT