studio-worker 0.4.7

# Dev loop (PM2 + cargo-watch)

How to run the worker locally for the kind of iterate-watch-restart
loop you want during real development.  Production workers run the
release binary under systemd / launchd, not this dev loop — see
[`docs/architecture/overview.md`](../architecture/overview.md#service--autostart).

## Why PM2

PM2 owns the process tree and the log files.  We use it for every
long-running thing on the dev box (assistant agents, llama servers,
the worker itself) so the lifecycle story is uniform.  Per the
software-factory rules (`~/Repositories/software-factory/instructions/rules/process-management/RULES.md`):

> Always use PM2 to manage long-running processes. Never use bare
> `nohup`, `&`, or `screen`. Anything taking more than ~10 seconds
> MUST run via PM2 with logs to file.

## Two flavours of dev process

### Watching variant (good for source-iteration)

```bash
pm2 start /tmp/studio-worker-ui-dev.sh --name studio-worker-ui-dev --no-autorestart
```

Wrapper script:

```bash
#!/usr/bin/env bash
set -euo pipefail
cd /home/webber/Repositories/studio-worker
export RUST_LOG="${RUST_LOG:-studio_worker=debug,info}"
export RUST_BACKTRACE=1
export DISPLAY="${DISPLAY:-:0}"
exec cargo watch \
  --why -w src -w Cargo.toml -w Cargo.lock -i target \
  -x 'run -- ui'
```

(The UI is the default build now — no `--features ui` and no
`PKG_CONFIG_PATH` dance: the GTK-free stack needs neither.)

`cargo-watch` rebuilds + restarts the worker on every source change.
Great for iterating on Rust code.  **Terrible** for letting a
long-running job complete: any agent (or you) touching `src/*.rs`
mid-job kills the child process, the WS session dies, the job goes
to terminal `failed`.

### Stable variant (good for chewing through a queue)

```bash
pm2 start /tmp/studio-worker-ui-stable.sh --name studio-worker-ui-stable --no-autorestart
```

Wrapper script:

```bash
#!/usr/bin/env bash
set -euo pipefail
cd /home/webber/Repositories/studio-worker
export RUST_LOG="${RUST_LOG:-studio_worker=info,warn}"
export RUST_BACKTRACE=1
export DISPLAY="${DISPLAY:-:0}"
exec ./target/debug/studio-worker ui
```

No `cargo watch`.  Runs the binary you've already built with
`cargo build` (UI is default).  Source-tree edits don't restart it.
This is what you want when:

- You need the worker to complete a multi-hour batch (e.g. the
  ~1000 z-image-turbo runs we did to backfill assets).
- Another agent is editing source files in the same checkout.
- You're testing the auto-update flow.

## Gotchas

- **One worker per `worker_id`**.  Don't run both flavours
  simultaneously — the studio's DO closes the older session with
  `4003 duplicate_worker` and both workers thrash trying to
  reconnect.  `pm2 stop` one before starting the other.
- **Orphan child after cargo-watch restart**.  Killing the
  watcher's `cargo run` doesn't always reap the
  `target/debug/studio-worker` child.  Symptom: `pm2 stop` reports
  the process as down but `pgrep -af target/debug/studio-worker`
  shows it's still alive (and still claiming jobs!).  Hunt with
  `pgrep -af`, `kill <pid>` directly.
- **`PKG_CONFIG_PATH` on Linuxbrew machines**.  If `/home/linuxbrew/.linuxbrew/bin/pkg-config`
  is first on PATH it can't see system `.pc` files (cairo, gtk-3),
  and the UI build fails.  This is no longer needed — the UI stack is
  GTK-free (eframe/glow via dlopen, ksni tray, rustls), so the
  Linuxbrew `pkg-config` ordering doesn't matter.  For the in-process
  llama.cpp backend (`--features all`) you only need `cmake` + a C/C++
  compiler on PATH.
- **DISPLAY**.  The UI needs an X server.  Export
  `DISPLAY=:0` (or your session's display).  Headless workers run
  `studio-worker run` instead of `ui`; same wrapper minus the
  `ui` arg and the DISPLAY export.

## Tailing logs

```bash
tail -f /home/webber/.pm2/logs/studio-worker-ui-stable-out.log
tail -f /home/webber/.pm2/logs/studio-worker-ui-stable-error.log
```

PM2's own `pm2 logs --lines 50` works but if you want long greps
without TUI interference, tailing the file directly is cheaper.

## Clean shutdown

```bash
pm2 stop studio-worker-ui-stable
pm2 delete studio-worker-ui-stable    # if you want it gone from `pm2 list`
```

The worker handles SIGTERM gracefully — finishes the current job
(up to ~5 s), then exits.

## Where this came from

We discovered the cargo-watch-kills-the-job problem ~10 minutes into
the 1000-job z-image-turbo backfill.  Switching to the stable
variant kept the WS session alive for the full ~3 hours.  See
[LESSONS_LEARNED](../../LESSONS_LEARNED.md) for the timeline.