zynk 1.0.0

Portable protocol and helper CLI for multi-agent collaboration.
# ADR 014: Debug Workflow

**Status:** accepted
**Date:** 2026-05-28
**Participants:** Claude (Opus 4.7), Codex (gpt-5.5 xhigh)
**Operator gate:** Zevs

## Context

ADR 010 defined Brainstorm mode. ADR 011 defined Decide mode. ADR 012 defined Review mode. ADR 013 defined Validate mode. Debug mode is the collaborative troubleshooting workflow for cases where observed behavior diverges from expected behavior and the cause is not yet known.

Debug mode must answer:

- How should agents state the symptom and expected behavior?
- How are facts frozen before speculation?
- How do agents propose, rank, test, and retire hypotheses?
- How should Codex and Claude divide investigation work without contaminating state?
- How are findings shared?
- When does debugging stop?
- How are workaround, fix, and root cause distinguished?
- When should the agents escalate to Zevs?
- How does the mode exit?

This ADR defines the local Debug workflow. Global mode-switching syntax remains deferred to ADR 020.

## Options considered

### Option A: Freeze facts, test hypotheses, avoid premature fixes

Enter Debug mode with:

- Symptom.
- Expected behavior.
- Observed behavior.
- Reproduction steps.
- Environment.
- Recent changes.
- Impact.

Workflow:

1. Freeze known facts before proposing fixes.
2. Create a hypothesis ledger with owner, variable, test, evidence, and status.
3. Split work by independent variables or layers.
4. Test one hypothesis at a time.
5. Share findings with evidence references.
6. Avoid fixes until a cause is supported.

Escalate to Zevs when:

- Reproduction is unavailable.
- Permissions or data are missing.
- Business preference is needed.
- Two evidence-backed hypotheses conflict.

Exit:

- `root-cause-found`
- `workaround-found`
- `inconclusive`
- `escalated`
- `mode-switch` to Validate, Review, or Decide

Strengths:

- Systematic and bias-resistant.
- Keeps fixes tied to evidence.
- Uses shared ledgers for auditability.

Trade-offs:

- Hypothesis states are underspecified.
- Parallelization can contaminate shared state if not controlled.
- Stop conditions depend on judgment.
- Workaround and fix can be conflated.

### Option B: Structured debugging with facts ledger, hypothesis states, and stop triggers

Use an explicit facts ledger, hypothesis state machine, parallelization taxonomy, hypothesis ranking, stop triggers, and workaround-vs-fix distinction.

Facts ledger:

- `symptom_transcript`
- `reproduction_state`: `reliable | flaky | one-off | from-logs-only`
- `env_snapshot`
- `version_refs`
- `recent_changes_list`
- `impact_scope`

Facts are versioned if they change during debugging. For example, reproduction state may change from `flaky` to `reliable` after isolation.

Hypothesis states:

- `proposed`: stated but not tested.
- `testing`: actively being probed.
- `supported`: test passed and the hypothesis still stands.
- `refuted`: test failed and eliminated the hypothesis.
- `inconclusive`: test was ambiguous, flaky, or incomplete.
- `parked`: deprioritized but not eliminated, with reason.

State transitions:

- `proposed -> testing -> supported | refuted | inconclusive`
- any state -> `parked`
- `supported -> refuted` if later evidence overturns it

`escalated` is not a hypothesis state; it is a debug-session disposition.

Parallelization patterns:

- `sequential-single`: one hypothesis at a time; both agents work on it.
- `parallel-independent-layers`: agents test different confirmed-independent layers.
- `parallel-same-target-different-angle`: agents test the same hypothesis with different probes.
- `pipeline`: one agent tests the current hypothesis while the other prepares the next.

Defaults:

- Use `sequential-single` for standard debugging.
- Use `parallel-independent-layers` for deep debugging when layers are confirmed independent.
- Use `pipeline` for quick low-stakes debugging.

Hypothesis ranking:

- `likelihood`: how plausible the hypothesis is.
- `cost_to_test`: how expensive or risky it is to test.
- `impact_if_true`: whether it explains the root cause or only a side issue.

Test order favors high-likelihood, low-cost hypotheses first, then moves down the cost gradient.

Stop triggers:

- Root cause found and verified through Validate mode.
- All plausible hypotheses are refuted with evidence and no new candidates emerge after one fresh-thinking round.
- Time or cost cap exceeds bug impact.
- Workaround exists and remaining investigation cost exceeds impact value, with operator agreement.
- Operator says stop or accepts current state.
- The issue is not a bug but a design choice, so switch to Decide mode.

Workaround vs fix:

- A workaround mitigates symptoms without addressing the root cause.
- A fix addresses the root cause.
- `workaround-found` does not mean `root-cause-found`.

Strengths:

- Makes hypothesis tracking explicit.
- Reduces endless debugging loops.
- Controls unsafe parallelism.
- Keeps workaround claims honest.
- Preserves changing facts as the investigation evolves.

Trade-offs:

- More ledger work than quick debugging.
- Ranking adds subjective scoring.
- Parallelization choice requires upfront thought.
- Full discipline may be heavy for small bugs.

### Option C: Scaled debug workflow with facts, hypothesis ledger, and stop discipline

Combine Option A's clean freeze-hypothesize-test flow with Option B's facts ledger, hypothesis states, parallelization patterns, ranking, stop triggers, and workaround-vs-fix distinction. Scale by rigor level.

Rigor levels:

- `quick`: minimal facts, `sequential-single` or `pipeline`, three states (`proposed`, `supported`, `refuted`), soft-stop only. If a quick-rigor test is ambiguous, mark the quick session `inconclusive` and either stop or upgrade to `standard`; do not force the result into `supported` or `refuted`.
- `standard`: facts ledger, default `sequential-single`, six hypothesis states, basic ranking, evidence references, explicit stop triggers.
- `deep`: versioned facts ledger, required parallelization choice, required ranking, full six-state hypothesis ledger, stop triggers, workaround-vs-fix distinction, and Validate-mode confirmation before claiming root cause.

Default:

- Use `standard` for most collaborative debugging.
- Use `quick` for low-risk local issues with reliable reproduction.
- Use `deep` for production incidents, data loss, security-sensitive issues, flaky bugs, race conditions, high-impact failures, or bugs with shared-state contamination risk.

Debug frame:

- `MODE: debug`
- `rigor: quick | standard | deep`
- symptom
- expected behavior
- observed behavior
- reproduction steps or reproduction state
- environment
- recent changes
- impact
- initial constraints
- time or cost cap, if known

Workflow:

1. Frame the symptom and expected behavior.
2. Freeze the facts ledger before proposing fixes.
3. Identify reproduction state.
4. Draft and rank hypotheses.
5. Declare parallelization pattern.
6. Test hypotheses in ranked order, one variable per test where possible.
7. Share evidence references and update hypothesis states.
8. When a cause appears supported, switch to Validate mode to confirm it.
9. Only after validation, proceed to fix planning, Review, or Decide as appropriate.
10. Stop or escalate when a stop trigger fires.

Messages exchanged:

- `type=propose` opens the debug frame.
- `type=status-update` shares facts, hypothesis state changes, or findings.
- `type=request-action` asks another agent to run a specific probe by a due condition.
- `type=answer` reports a probe result or answers a debug question.
- `type=disagree` records conflicting interpretations of evidence.
- `type=request-review` asks for review of the hypothesis ledger, proposed fix, or evidence.
- `type=error` reports malformed debug frames or unsafe missing fields.

Facts ledger fields:

- `symptom_transcript`
- `expected_behavior`
- `observed_behavior`
- `reproduction_steps`
- `reproduction_state`
- `env_snapshot`
- `version_refs`
- `recent_changes_list`
- `impact_scope`
- `constraints`
- `fact_version`

Hypothesis ledger fields:

- `id`
- `hypothesis`
- `owner`
- `variable`
- `rank_likelihood`
- `rank_cost_to_test`
- `rank_impact_if_true`
- `test`
- `evidence_ref`
- `state`
- `state_reason`
- `next_action`

Parallelization guardrails:

- Use `sequential-single` when tests mutate shared state, the bug is a race condition, or repro is flaky.
- Use `parallel-independent-layers` only after layers are confirmed independent.
- Use `parallel-same-target-different-angle` only for different read paths or independent observations with no shared mutable state and no side effects on each other.
- Use `pipeline` when preparation work can happen without touching the system under test.

Disposition:

- `root-cause-found`: cause is supported and validated.
- `workaround-found`: symptom mitigation exists, but root cause remains unresolved or unvalidated.
- `fix-ready`: root cause is validated and a fix path is clear. This implies `root-cause-found` as a prerequisite; the two states are sequential, not parallel.
- `inconclusive`: no supported cause after stop triggers or evidence exhaustion.
- `escalated`: Zevs must provide missing data, preference, access, or stop/continue decision.
- `mode-switch`: move to Validate for confirmation, Review for proposed fix quality, Decide for design/preference choices, or Brainstorm for missing hypotheses.

Strengths:

- Keeps debugging evidence-driven.
- Makes hypothesis status and stop rules explicit.
- Prevents unsafe parallel debugging from contaminating shared state.
- Preserves the distinction between mitigation and root-cause fix.
- Continues the `quick | standard | deep` pattern from ADR 010-013.

Trade-offs:

- More bookkeeping than ad hoc debugging.
- Hypothesis ranking and impact scoring require judgment.
- Deep debugging can slow down urgent mitigations.
- Validate-mode confirmation adds a workflow step before claiming success.

## Decision

Accepted decision: Option C.

Current positions:

- Codex initial proposal: Option A.
- Claude counterproposal: Option B.
- Convergence candidate: Option C.
- Claude's provisional vote: Option C because the rigor-tier pattern continues from ADR 010-013 and deep production debugging needs full discipline.
- Codex currently prefers Option C because it keeps quick debugging possible while making serious debugging systematic, evidence-led, and stop-bounded.
- Claude approved the draft and confirmed its preferred final option is Option C.
- Codex folded in Claude's non-blocking polish on `fix-ready` ordering, ambiguous quick-rigor results, and concrete guardrails for `parallel-same-target-different-angle`.
- Zevs's standing approval applies once Codex and Claude agree.

Rationale: Option C preserves a fast path for small debugging while requiring serious debugging to be fact-led, hypothesis-tracked, stop-bounded, and validated before claiming root cause or fix readiness.

## Consequences

- Positive: Debug mode separates facts, hypotheses, tests, evidence, and fixes.
- Positive: Explicit stop triggers reduce endless debugging churn.
- Positive: Parallelization guardrails reduce state contamination.
- Positive: Workaround and fix are no longer conflated.
- Trade-off: Debug sessions require ledger maintenance.
- Negative: Deep debugging can add ceremony before a fix is attempted.

## How to apply

To enter Debug mode before ADR 020 finalizes global mode syntax, send a short `type=propose` message:

```text
[from-codex via herdr]
[herdr from=codex:w652dc9b3ded432-2 to=claude:w652dc9b3ded432-1 mid=<mid> type=propose]
BODY: MODE: debug; rigor=standard; symptom=<symptom>; expected=<expected>; observed=<observed>; repro=<steps-or-state>; env=<environment>; impact=<scope>; recent_changes=<summary>.
```

Before testing:

- Freeze the facts ledger.
- Declare reproduction state.
- Draft hypotheses.
- Choose parallelization pattern.
- Rank hypotheses by likelihood, cost to test, and impact if true.

When sharing findings:

- Reference the hypothesis id.
- State the probe or observation.
- Include evidence reference.
- Update hypothesis state with reason.
- State next action.

Before claiming fixed:

- Distinguish workaround from fix.
- Validate the supported root cause through Validate mode.
- Record residual risk and follow-up work.

## Related

- `outputs/decisions/001-provenance.md`
- `outputs/decisions/002-agent-identity.md`
- `outputs/decisions/003-message-structure.md`
- `outputs/decisions/004-identity-and-message-profile.md`
- `outputs/decisions/010-brainstorm.md`
- `outputs/decisions/011-decide.md`
- `outputs/decisions/012-review.md`
- `outputs/decisions/013-validate.md`
- Future ADR 020: Mode Switching
- Future ADR 021: Conflict Resolution