FerrisGrid captures the current screen, maps it to deterministic coordinates, returns compact Markdown to an agent, executes one constrained action, captures the result, and exits. The agent does the reasoning. FerrisGrid handles the screen, coordinates, input, and local trace.
┌──────────────┐ observe ┌──────────────┐
│ Agent / LLM │ ─────────────> │ FerrisGrid │
│ │ <───────────── │ screenshot + │
│ choose one │ Markdown │ coordinates │
│ action │ └──────────────┘
│ │ act ┌──────────────┐
│ │ ─────────────> │ validate + │
│ │ <───────────── │ execute one │
└──────────────┘ screenshot └──────────────┘
Why FerrisGrid?
- Eyes plus a map: screenshots become coordinate-backed observations an LLM can reason over.
- Single-step by default: every call performs one observation or one action.
- Deterministic coordinates: screenshots map cleanly back to native screen pixels.
- Local-first traces: screenshots, metadata, action requests, and results stay under
.ferrisgrid/. - Cross-platform shape: the agent-facing protocol is the same across macOS, Linux, and Windows where the platform allows it.
- Container-friendly: run a Linux desktop workspace in Docker and watch it through noVNC while the agent works away from your main screen.
Quick Start
Build and check the CLI from source:
Capture the current screen:
Run one action from a Markdown action file:
Agent Skills
If you are an agent, use this script to download and install the FerrisGrid skills:
|
Run it from the directory that should receive the skill folders. The script downloads the repository zip, extracts the contents of .agents/skills, and installs those skill directories into the current directory.
Docker Workspace
FerrisGrid can run inside a Linux container with its own X11 display. The agent calls FerrisGrid with docker exec; input happens inside the container, not on your main desktop.
Open the viewer:
http://127.0.0.1:6080/vnc.html?autoconnect=1&resize=scale
Then run:
Documentation
Official docs live in docs/official.
Brand positioning and story notes live in docs/branding.
The docs use a terminal-brutalist Terminal Violet palette: black surfaces, violet brand/action states, cyan coordinate accents, and status colors for execution feedback.
Project Status
FerrisGrid is early, local-first infrastructure for agent-facing visual control. The current focus is reliable observe/act behavior, local traces, recap output, and containerized Linux workspaces.
License
MIT