Skip to main content

Module daemon_supervisor

Module daemon_supervisor 

Source
Expand description

wire daemon --all-sessions — multi-session supervisor.

§Why

honey-pine’s 2026-06-01 dogfood (#162) surfaced a launchd-vs-session isolation gap: the sh.slancha.wire.daemon launchd unit invokes wire daemon --interval 5 with no cwd context. With WIRE_HOME unset, the daemon resolves to the default session WIRE_HOME and silently skips every other initialized session. Operators with multiple per-project sessions (slancha-mesh, wire, etc.) saw their shell wire status report running:false even with the launchd daemon perfectly alive — same daemon, different state tree.

Her working remedy was launchctl bootout + nohup wire daemon from the project cwd. That works for one session but doesn’t scale to N. The architectural fix is a supervisor that owns the multi-session orchestration: one supervisor process per launchd unit, N child wire daemon --session <name> processes — each with its own pinned WIRE_HOME and its own pidfile under that session’s state dir. wire status from any cwd then sees its session’s child pid and reports truthfully.

§Model

  • Fork-exec, not threads. Each session’s daemon needs its own WIRE_HOME. We set it via the child process env so the daemon code path stays unchanged. Threads would mean global mutable WIRE_HOME and cross-session races.
  • Idempotent spawn. Before spawning a child for session S, check daemon_singleton_holder() on that session’s home. If a live daemon already exists (operator ran wire daemon directly in S’s cwd, or supervisor restarted and the old child is still alive), leave it alone.
  • Reap via polling, not SIGCHLD. macOS launchd-supervised processes already get SIGCHLD overhead; try_wait polling on a short interval is simpler and bug-free across platforms.
  • Backoff on rapid failure. A child that exits within 10s of spawn doubles its respawn delay (1s → 60s cap). Prevents a broken session (corrupt key, missing relay) from fork-bombing.
  • Don’t exit on zero sessions. Sleep and re-poll the registry — new sessions get picked up without supervisor restart.
  • Adopt orphaned children on supervisor restart. When launchd relaunches the supervisor, the previous supervisor’s children keep running (correct: they’re still syncing). New supervisor sees their pidfiles, skips re-spawning, and lets them keep going until their next natural exit (then it spawns a fresh child).

§Invariants

  • One supervisor per launchd unit per machine. Singleton guard on sessions_root()/supervisor.pid (separate from per-session daemon pidfiles).
  • Child env contains exactly one wire-relevant variable: WIRE_HOME=<session-home>. Any other inherited WIRE_* vars are stripped so the operator’s shell config doesn’t leak in.
  • Per-session daemon code is unchanged — supervisor is a pure orchestrator.

Structs§

SupervisedSession
One session as seen by the supervisor.
SupervisorState
Read-only snapshot of the supervisor’s current topology — supervisor liveness + per-session daemon liveness + orphan pids the supervisor is not currently managing. Used by wire supervisor (the CLI counterpart to single-session wire status) so operators can ask “what is the multi-session supervisor doing?” in one command instead of cross-referencing pgrep against per-session pidfiles by hand.

Functions§

read_supervisor_state
Build a SupervisorState snapshot. Pure read; no fork / no pidfile mutation. Best-effort on every component (filesystem errors yield None / empty rather than failing the whole call).
run_supervisor
Entrypoint for wire daemon --all-sessions. Loops forever; only returns Err on a setup error (e.g. cannot resolve sessions_root).