visual-rubric 0.2.0

AI-assisted screenshot rubric runner for local visual UX review
Documentation

visual-rubric

CI Nix crates.io

visual-rubric runs AI-assisted rubric checks against screenshots through codex-acp. It is intended for local visual UX review loops where deterministic tests can prove structure and screenshots can catch layout, hierarchy, and readability regressions.

The crate exposes:

  • evaluate_image_rubric_with_options for one-off screenshot checks.
  • evaluate_image_rubric_with_config when callers need a custom codex-acp binary, environment, or working directory.
  • RubricPool for repeated checks with process reuse, retry backoff, quota detection, and worker recycling.
  • BatchRubricRun for caller-provided asset batches with changed-file selection, partial-error reports, ACP log capture, and optional issue classification hooks.
  • visual-rubric CLI for image checks, local static hosting, screenshot capture, and advisory audit reports.

Feature flags:

  • fake-codex-acp builds the fake-codex-acp test helper binary. It is off by default and does not change the library API.

Project-specific judgment belongs in the caller-provided system_prompt; the default prompt only covers generic screenshot breakage such as clipped text, overlapping controls, blank regions, illegible contrast, and visibly broken layout.

visual-rubric \
  --image site-desktop.png \
  --question "Does the install flow make prerequisites, commands, and next steps clear?" \
  --system-prompt "You are auditing a software project website install section."

The same form is available as an explicit subcommand:

visual-rubric image \
  --image site-desktop.png \
  --question "Does the install section stay readable?"

For generated assets, callers can keep project-specific discovery downstream and let the crate own generic batch mechanics:

use visual_rubric::{
    AssetSnapshot, BatchRubricConfig, BatchRubricRun, PoolConfig, SelectionMode,
    diff_snapshots,
};

let stable_hash = |bytes: &[u8]| format!("{:x}", bytes.len());
let before = AssetSnapshot::capture(["before.png"], stable_hash)?;
let after = AssetSnapshot::capture(["after.png"], stable_hash)?;
let changes = diff_snapshots(&before, &after);
let report = BatchRubricRun::new(BatchRubricConfig {
    pool: PoolConfig::default(),
    question: "Does this image pass visual QA?".to_owned(),
    selection_mode: SelectionMode::ChangedOnly,
    classifier: None,
})
.run(&changes);

For local website iteration, serve a static directory, capture browser screenshots, and write a report:

visual-rubric audit \
  --root website/public \
  --path __audit/install.html \
  --browser chromium \
  --browser-arg=--disable-dev-shm-usage \
  --wait-ms 250 \
  --capture-retries 1 \
  --viewport desktop=1440x1100 \
  --viewport mobile=390x1800 \
  --question "Does this install section make the next action obvious?" \
  --report target/visual-rubric/report.json

Audit reports are versioned JSON. They include an aggregate status, capture URL, elapsed time, effective high-level options, and one rubric result per screenshot. Use --fail-on-rubric when CI should fail on rubric failures or rubric errors.

For Home Manager-managed setups, keep backend selection in ~/.config/visual-rubric/config.toml and run:

visual-rubric configured \
  --image site-desktop.png \
  --question "Does this page stay readable?"

Set mode = "direct" for direct codex-acp screenshot review, or mode = "pipeline" for Qwen3-VL extraction followed by ACP rubric scoring. --mode direct and --mode pipeline override the TOML mode for one run.

For manual inspection without rubric evaluation:

visual-rubric serve --root website/public --port 1111

The model must return strict JSON:

{ "verdict": "pass", "reason": "short reason", "anomalies": [] }