looop 0.15.0

A tiny, portable, Kubernetes-shaped control loop for your work
# TODO — design/structure fixes (non-security)

Tracked from a critical design review. Security items (run_shell gating,
guardrail self-modification, permission bypass) are deliberately **out of scope**
here and tracked separately.

Each box is meant to land as its own commit.

## Quick wins (docs / config honesty)

- [x] **Stale version reference.** `README.md` still says `looop version # -> looop 0.1.0`
  while `Cargo.toml` is at 0.13.0. Don't hard-code a version that drifts.
- [x] **Model-allocation comment overgeneralizes.** `config.rs` M4 note claims "the
  default tick uses the same strong model as the worker (claude-opus-4-8)". That
  is only literally true for the `pi` runner; the `claude` runner pins no model on
  either `tick` or `interactive` (both inherit the CLI default — equal, which still
  satisfies the principle). Make the comment say what the code actually does.

## Observability

- [x] **Sensor failures are invisible to the decider.** A user `sensors/*.sh` that
  exits non-zero (or times out, rc 124) leaves whatever partial/empty stdout it
  wrote as the snapshot; stderr goes to a `.err` file that the prompt never reads.
  The decider then reasons over a blank/garbage world and may `noop`. Replace a
  failed sensor's snapshot with a normalized error object
  (`{"error":…,"exit_code":…,"stderr":<tail>}`) so the failure (a) reaches the
  prompt and (b) participates in the world hash — which also means *fixing* a
  broken sensor wakes the loop next beat (addresses the M3 "broken sensor never
  wakes" sharp edge).

## Architecture / behavior

- [x] **Time-driven goals are impossible; a `noop` silences the loop.** A successful
  decision (incl. `noop`) commits the world hash, so the loop won't call the AI
  again until the world changes externally. The only timer is `today.sh` (daily).
  A goal like "re-check in 5 min" cannot fire: `next_interval_s` only changes the
  sleep, then the unchanged-hash gate skips the AI anyway. Fix: when the decider
  sets `next_interval_s`, treat the next beat as a **forced re-decide** that
  bypasses the unchanged-hash skip exactly once. The AI opts in, so the
  level-triggered default is preserved.

## Deferred — needs design, not a quick patch

- [x] **No fairness across goals (starvation).** One move per tick + "most
  important move" means a perpetually-changing high-priority goal can starve the
  rest. *Done (visibility, not a hard scheduler):* a `sys-goals` system sensor now
  surfaces per-goal staleness (`.detail.goals[id].age_s`), stamped by the executor
  whenever a move targets a goal. The prompt tells the decider to prefer the
  longest-neglected of comparable ready goals. Ages live in `.detail` only, so
  they never wake the loop. This keeps RULE 1 (one move/tick) and "the AI judges
  importance" intact — it just gives the AI the data to avoid starving a goal
  (note: workers run in parallel, so dispatching a neglected goal doesn't block
  the others).
- [x] **Budget breaker fails open for non-pi/claude runners.** Cost metering was
  hard-wired to pi/claude NDJSON shapes; a custom runner produced no metered cost,
  so `max_daily_usd` silently never tripped. *Done:* (1) a custom runner declares
  its cost shape via a per-runner `"cost":{type,pointer,mode}` spec, so any runner
  can be metered; (2) fail-closed — if a budget is set but the runner stays
  unmetered for `UNMETERED_LIMIT` consecutive runs, the breaker opens and skips
  the AI (self-heals when the runner/spec signature changes).
- [x] **Claims are advisory only.** Per-goal duplication was already prevented by
  the same-id alive guard in `cmd_start_session`; the real gap was the
  resource-level lease being a non-atomic `printf > claims/<name>.json` (a
  last-writer-wins TOCTOU race a worker could also just skip). *Done:* added
  `looop _ claim <name>` / `looop _ unclaim <name>` — an atomic (`O_EXCL`),
  liveness-aware test-and-set that exits non-zero when a LIVE session holds the
  lease and reclaims a stale one. The worker CONTRACT now uses it instead of raw
  file ops, so the lease is a real mutex. (looop still can't pre-know an
  arbitrary worker-chosen resource name, so enforcement is at the claim primitive,
  not at spawn — which is the right layer.)
- [x] **`current_exe` re-exec fragility.** The detached supervisor re-execs the
  running binary; an upgrade / `nix gc` / move during a long-lived pulse could
  leave `current_exe` pointing at a stale or deleted inode. Fixed upstream in
  babysit 0.12.0 (yusukeshib/babysit#30): the supervisor re-exec prefers
  `/proc/self/exe` on Linux (the kernel keeps it valid even after the binary is
  replaced/unlinked) and exposes `Babysit::with_supervisor_exe` for an embedder
  to pin a stable path. looop consumes it by bumping the dependency; on Linux the
  pulse now survives its own binary being upgraded mid-run. (macOS has no
  `/proc/self/exe` equivalent, so it keeps `current_exe()` — an inherent OS limit,
  mitigated in practice by nix GC rooting live processes' executables.)
- [x] **Single-instance lock liveness.** Replaced the mkdir-lock + PID-liveness
  check with a `flock(2)` on `<data>/.lock/lock`: the kernel releases it when the
  pulse dies for any reason, so there is no stale lock to reclaim and no PID-reuse
  false positive that could wedge the next start. `looop status` reads liveness
  the same way. (NFS, where flock is unreliable, remains a caveat.)
- [x] **`content_hash` is a hand-rolled FNV-style hash***won't fix (the current
  choice is correct).* It's used only for change-detection (`world_hash` vs the
  last beat), not security; a 128-bit FNV-1a collision is astronomically unlikely
  and self-heals on the next real change. Crucially it must stay STABLE across
  restarts/upgrades because `.last-tick-hash` is persisted — std `DefaultHasher`
  (SipHash) has no cross-version stability guarantee, and pulling in `sha2`/etc.
  is an unnecessary dependency for zero benefit. The bespoke FNV is the right call.

---

## Out of scope (deliberately not addressed here)

The original review's **security** findings were set aside by request and remain
open: `run_shell` executes an LLM-chosen command with no gate; the decider can
rewrite its own guardrails via `write_playbook`; the default runner bypasses
permission prompts. Track these separately if/when security is in scope.