agent-doc 0.28.2

Interactive document sessions with AI agents
Documentation
# Race Condition Analysis

Concurrency hazards in agent-doc and tmux-router, with mitigations.

## Protected (Resolved)

### Registry read-modify-write

**Location:** `tmux-router/src/registry.rs`

`sessions.json` is shared state between concurrent `agent-doc` processes (claim, route, sync, resync). Without protection, two processes could read the same registry, each make a change, and one would clobber the other.

**Mitigation:** `RegistryLock` (flock-based advisory lock via `fs2`). All mutating operations acquire the lock: `register_full()`, `prune()`, `update_window_for_entry()`, `with_registry()`, `with_registry_val()`.

**Since:** v0.2.1 / v0.9.5

### Snapshot read-modify-write

**Location:** `agent-doc/src/snapshot.rs`

The snapshot file tracks the last-submitted document state for diff computation. Concurrent submits (e.g., `agent-doc run` + `agent-doc watch`) could read the same snapshot, both compute diffs, and one would overwrite the other's updated snapshot.

**Mitigation:** `SnapshotLock` (flock-based, same pattern as `RegistryLock`). `load()`, `save()`, and `delete()` all acquire the lock. `with_snapshot()` provides a transactional read-modify-write helper.

**Since:** v0.9.5

### load()/save() API footgun

**Location:** `agent-doc/src/sessions.rs`

The raw `load()` and `save()` functions do not acquire locks internally. Callers must remember to hold `RegistryLock`. If exposed publicly, new code could easily introduce unprotected mutations.

**Mitigation:** Changed visibility to `pub(crate)`. External callers must use `tmux_router::with_registry()` which enforces locking.

**Since:** v0.9.5

### Nested lock deadlock (flock non-reentrancy)

**Location:** `tmux-router/src/registry.rs`

`flock` is not reentrant on the same thread on Linux. If `prune()` is called from within a context that already holds `RegistryLock`, it would deadlock.

**Mitigation:** Thread-local `REGISTRY_LOCK_HELD` flag. `RegistryLock::acquire()` checks the flag and returns an error if already held. `acquire_or_skip()` returns `None` with a warning instead. `prune()` uses `acquire_or_skip()` so it becomes a no-op when called from within a locked context.

**Since:** v0.9.6

### Lazy parallelization (skill-level)

**Location:** `.claude/skills/agent-doc/SKILL.md`, `agent-doc/src/submit.rs`

When the user submits multiple documents (`/agent-doc A`, `/agent-doc B`), the skill must decide whether to process them sequentially or in parallel. Claude Code processes messages sequentially in the main context — Document B blocks until A completes.

**Execution model:**
- **Single document (default):** Process directly in the main agent context. No subagent, no token overhead.
- **Parallel documents:** Use `Agent(run_in_background: true)` for the 2nd+ document. Each background subagent processes its document cycle independently.
- **Same document re-submit:** Filesystem locks serialize access. The second invocation blocks until the first completes.

**Safety guarantees (tested):**
1. Different files: Independent flock + atomic rename — no shared lock contention, no interference (`parallel_different_files_no_interference`)
2. Same file: flock serializes the read-modify-write cycle — both writes land in order (`same_file_serialized_by_flock`)
3. No partial reads: flock prevents a reader from seeing a half-written document (`flock_prevents_partial_read_during_write`)

**Since:** v0.9.6

## Mitigated (Low Residual Risk)

### Watch daemon write window

**Location:** `agent-doc/src/submit.rs`, `agent-doc/src/watch.rs`

If the user saves a file at the exact moment the daemon is writing back the agent response, the write could clobber the user's save. The 3-way merge re-reads the file before writing, but there is a micro-window between re-read and write.

**Mitigations (two layers):**
1. **Atomic rename:** Document writes use `tempfile::NamedTempFile` + `persist()` (rename). The write is instantaneous from the filesystem perspective, eliminating partial-read hazards.
2. **Advisory flock:** `submit::run()` acquires an advisory lock on `<file>.md.agent-doc.lock` before the re-read/write/snapshot sequence. This serializes concurrent `agent-doc` processes writing to the same document (e.g., watch daemon vs. manual run). Editors do not respect advisory locks, so this only protects agent-doc-vs-agent-doc races.

**Since:** v0.9.6

### claim validate-to-register TOCTOU gap

**Location:** `agent-doc/src/claim.rs`

`validate_file_claim()` acquires `RegistryLock`, removes stale claims, releases the lock. Then `register_full()` acquires the lock again and inserts the new claim. Another process could claim the same file in the gap between the two lock acquisitions.

**Residual risk:** Negligible. `register_full()` deduplicates entries pointing to the same pane, so the system self-heals. The stale removal in `validate_file_claim` may be wasted work in the rare case, but no data is lost.

## Snapshot write atomicity

**Location:** `agent-doc/src/snapshot.rs`

Snapshot writes now use atomic rename (tempfile + persist) in addition to flock, ensuring that concurrent readers never see a partial snapshot file.

**Since:** v0.9.6