# Worked Example: Debugging Loop in the Reactive Semantic Runtime
## Context
The proposal at `/home/hetoku/Downloads/reactive_semantic_runtime_proposal.md` is at the manifesto stage — sharp diagnosis, evocative vocabulary ("molecules", "reactions"), but the *how* is mostly metaphor. The two questions left unanswered are:
1. **Schema problem.** Where do molecule types come from, and is the runtime expressive enough once they're fixed?
2. **Determinism boundary.** Where does non-determinism (LLM calls) sit relative to the incremental-recomputation machinery? This determines whether DD-style incrementality delivers anything.
Rather than refining the manifesto in the abstract, the most useful next step is to **force the abstraction to meet a concrete task end-to-end**. If the worked example is clean, the framework has legs. If it hits friction at every join, the proposal needs revision before any code is written.
This plan is the worked example. It picks one task — **a regression-debugging loop** — and writes it out as molecule schemas, reactions, and LLM call points, then explicitly calls out what the exercise reveals about the proposal's gaps.
The output is a design document, not code. No implementation files are touched; the only writes are to this plan file.
---
## The Task
**Scenario.** A developer pushes a code change. A test that previously passed now fails. The runtime should:
1. Detect the regression and collect surrounding context (recent edits, failure trace, blame info).
2. Synthesize one or more hypotheses about the cause.
3. Verify each hypothesis in parallel (static analysis, targeted re-runs, git archaeology).
4. Rank hypotheses by accumulated evidence.
5. Generate a patch candidate for the top-ranked hypothesis.
6. Re-evaluate incrementally when new evidence arrives (a new test run, a new edit, a new file save) — without re-doing settled work.
This task is well-chosen because it exercises every claim in the proposal: joins (failure × edits), fanouts (hypothesis → verifications), aggregations (evidence → ranking), parallel reasoning (concurrent verification), incremental recomputation (new event → minimal redo), and the LLM-as-operator boundary (synthesis & patching are LLM-driven; everything else is deterministic).
---
## Molecule Schemas
```text
// Inputs (produced by external sensors: file watcher, test runner, git hook)
TestFailure { id, test_name, message, stack_trace, ts }
TestPass { id, test_name, ts }
FileChange { id, path, diff_summary, author, ts }
// Derived deterministic views
RecentEdit { path, ts, change_id } // windowed projection of FileChange
SuspectedRegression { test_id, change_id, ts } // join: failure ⋈ edit-within-window
// LLM-produced
Hypothesis { id, regression_id, description,
suspected_cause, suspected_files: [path],
prior_confidence: float }
// Verification fanout
VerificationTask { id, hypothesis_id, kind, status }
// kind ∈ {static_analysis, targeted_test, git_blame, dep_diff}
Evidence { id, hypothesis_id, kind, supports: bool, weight: float, detail }
// Aggregation
RankedHypothesis { hypothesis_id, score, evidence_ids: [id], rank: int }
// Output
PatchCandidate { id, hypothesis_id, files: [{path, diff}], rationale }
```
**Note on the schema.** None of these are auto-discovered. They are designed up-front by the human author of the reaction set. This is the proposal's first hidden cost made explicit (see Findings §1).
---
## Reactions
### R1 — windowed projection
```text
when FileChange(c) where c.ts > now() - 1h
emit RecentEdit { path: c.path, ts: c.ts, change_id: c.id }
```
Deterministic. Pure projection with a temporal window. Retracts as the window slides.
### R2 — suspect join
```text
when TestFailure(f), RecentEdit(e)
where any_overlap(stack_files(f), e.path) AND e.ts ∈ [f.ts - 1h, f.ts]
emit SuspectedRegression { test_id: f.id, change_id: e.change_id, ts: f.ts }
```
Deterministic. Classic differential-dataflow join. The window means SuspectedRegression molecules retract automatically when their inputs age out.
### R3 — hypothesis synthesis (LLM operator)
```text
when SuspectedRegression(s),
TestFailure(f) where f.id == s.test_id,
FileChange(c) where c.id == s.change_id
emit Hypothesis(...)
via LLM(prompt = synthesize_hypothesis_template(f, c),
cache_key = hash(f, c))
```
**Non-deterministic**, but cached by content hash of inputs. If the same `(f, c)` pair appears again, no LLM call is made. This is the *only* way incrementality survives the LLM operator: deterministic cache keys.
### R4 — verification fanout
```text
when Hypothesis(h)
emit VerificationTask { id: uuid(), hypothesis_id: h.id, kind: "static_analysis", status: "pending" },
VerificationTask { id: uuid(), hypothesis_id: h.id, kind: "git_blame", status: "pending" },
VerificationTask { id: uuid(), hypothesis_id: h.id, kind: "targeted_test", status: "pending" },
VerificationTask { id: uuid(), hypothesis_id: h.id, kind: "dep_diff", status: "pending" }
```
Deterministic fanout. Four verifications kicked off per hypothesis.
### R5 — verification execution (tool operator, parallel)
```text
when VerificationTask(v) where v.status == "pending"
emit Evidence(...)
via tool(v.kind)(v.hypothesis_id)
```
Parallel execution: the scheduler dispatches all pending VerificationTasks concurrently. Tools (static analysis, git, test runner) are deterministic given their inputs and can be cached by content hash of `(kind, hypothesis_inputs)`.
### R6 — ranking aggregation
```text
when Hypothesis(h), Evidence*(e where e.hypothesis_id == h.id)
emit RankedHypothesis {
hypothesis_id: h.id,
score: weighted_sum(e.weight * (e.supports ? 1 : -1) for e),
evidence_ids: [e.id ...],
rank: dense_rank() over score desc
}
```
Deterministic aggregation. Re-fires incrementally as new Evidence arrives — no re-evaluation of unrelated hypotheses.
### R7 — patch generation (LLM operator, top-1 only)
```text
when RankedHypothesis(r) where r.rank == 1 AND r.score > τ
emit PatchCandidate(...)
via LLM(prompt = patch_template(r), cache_key = hash(r.hypothesis_id, r.evidence_ids))
```
**Non-deterministic, cached.** Only the top-ranked hypothesis triggers a patch attempt. If a later piece of evidence demotes it, the PatchCandidate retracts (this is DD-native behavior, not extra logic).
---
## Where the LLM Lives
Two and only two LLM call points: **R3** (hypothesis synthesis) and **R7** (patch generation). Everything else is deterministic dataflow over typed records. This is the proposal's central thesis given concrete shape — and the worked example confirms it's *expressible*. The LLM is genuinely a leaf operator.
**Cache keys** are the load-bearing mechanism: both LLM reactions key on a content hash of their input molecules. As long as upstream deterministic reactions produce stable molecules, LLM outputs are reused. This is how incrementality survives non-determinism.
---
## Concurrency Model (made explicit)
The proposal hand-waves concurrency. The worked example forces a commitment:
- **Single-writer reactor with logical timestamps** (Differential Dataflow style). Each molecule carries a logical timestamp; reactions are operators over timestamped streams.
- **Arrangements** (indexed views) for the molecule store, keyed by molecule id and by join keys (`test_id`, `hypothesis_id`).
- **Parallel reaction execution** is achieved by Tokio tasks per reaction *worker*, but commits to the molecule store go through the reactor's logical timeline. This avoids write skew without distributed-consensus machinery.
- Tool/LLM calls are async and may complete out of order; their results are stamped with the *input's* logical timestamp, so downstream aggregations see a consistent view.
This is the model the proposal needs to commit to. It's the same one Materialize uses, and it's the only one that makes "incremental" + "parallel" coherent claims simultaneously.
---
## Incremental Recomputation: A Concrete Trace
Initial state: one TestFailure `f1`, one FileChange `c1` matching it. The pipeline runs end-to-end and produces `PatchCandidate p1`.
**Event A: a new file change `c2` arrives, also overlapping `f1`'s stack.**
- R1 emits `RecentEdit(c2)`.
- R2 emits new `SuspectedRegression(f1, c2)`. The existing `SuspectedRegression(f1, c1)` is unaffected.
- R3 fires for the new pair, generating `Hypothesis h2`. `h1` is untouched (cache hit on `(f1, c1)`).
- R4 fans out 4 new verifications for `h2`.
- R5 runs them in parallel. The 4 verifications for `h1` are not re-run (cached).
- R6 incrementally updates ranks: now both `h1` and `h2` are scored. If `h2`'s score exceeds `h1`'s, the rank flips.
- R7: if `h1` was top and is no longer, `PatchCandidate p1` retracts and `p2` is generated. If `h1` is still top, no LLM call is made — `p1` survives.
**Cost of event A:** 1 LLM call (R3 for the new hypothesis) + 4 tool calls (R5 for the new verifications) + 1 LLM call only if rank flipped (R7). Compare to a chat-loop agent, which would re-feed the entire transcript and re-derive everything from scratch.
This is what "incremental reasoning" buys you, made concrete.
---
## Findings: What This Exercise Reveals About the Proposal
These are the items the proposal must address before moving to implementation:
**1. Schemas are human-designed, full stop.** The worked example required an author to pick `Hypothesis`, `VerificationTask`, `Evidence`, `RankedHypothesis` etc. and decide their fields. This is *not* automatic. The runtime is only as expressive as its schema set, and changing schemas is a breaking change to all downstream reactions. The proposal should rename "Molecule Space" to "Schema-Designed Molecule Space" and treat schema design as a first-class authoring activity, like defining a database schema.
**2. The determinism boundary is now visible.** Reactions split cleanly into deterministic (R1, R2, R4, R5, R6) and non-deterministic (R3, R7). The non-deterministic ones must be content-cached. The proposal should add a section: *"LLM operators are leaves of the reaction DAG. They are non-deterministic but content-addressable. Caching by input hash is mandatory, not optional, for incrementality to survive."*
**3. Self-modifying reactions are not needed for this task.** The worked example is fully expressible with a static reaction set. Self-modification should be marked as v2 and clearly bracketed as an experimental capability, not a core feature — it's where decidability and debuggability die.
**4. Differential-dataflow semantics solve the concurrency question.** Single-writer reactor + logical timestamps + arrangements. The proposal should commit to this explicitly. Without it, "parallel reactions over a shared molecule store" is hand-waving.
**5. Incrementality only pays off when upstream molecules are stable.** If every keystroke produced a new `FileChange` molecule that invalidated everything downstream, you'd be back to chat-loop semantics with extra steps. The proposal needs a story about *coalescing* (debouncing) input molecules — the file watcher should produce one `FileChange` per save, not per character. Same for editor cursor movements, partial test outputs, etc. This is mundane but load-bearing.
**6. There's a benchmark candidate now.** Run this debugging loop against the same task expressed as (a) a LangGraph agent, (b) a single-shot Claude call with the whole transcript, (c) this runtime. Measure: tokens consumed per `FileChange` event, latency to PatchCandidate, correctness on a held-out set of regressions. This is the concrete evaluation the proposal lacked.
---
## Critical Files
- `/home/hetoku/Downloads/reactive_semantic_runtime_proposal.md` — the proposal under examination.
- This plan — the worked example. No code is written.
The intent is to read this plan, decide which findings warrant proposal revisions, and then either:
- iterate on the proposal text, or
- start the implementation skeleton in `/home/hetoku/data/work/thalamus` with the determinism boundary and concurrency model committed up-front.
---
## Verification
This is a design exercise, so "verification" means a structured review rather than running tests:
1. **Coverage check.** For each capability claimed in the proposal (joins, fanouts, aggregations, parallel reasoning, incremental recomputation, LLM-as-operator), point to the reaction in this document that exercises it. If a claim doesn't show up in the worked example, it's either not core or the example is too small.
2. **Schema sufficiency.** Walk through Event A (the new-file-change trace above). Does every step have a defined molecule type? If a step needs a new molecule, the schema set is incomplete.
3. **Cache-key soundness.** For both LLM reactions (R3, R7), confirm the cache key is a function of input molecule content only — not of wall-clock time, hypothesis IDs, or other unstable fields. If a cache key is unstable, incrementality is broken.
4. **Concurrency invariant.** Confirm that no reaction in the set requires reading molecules that may not yet have been written by an in-flight reaction. The DAG must be acyclic over molecule types (Hypothesis can produce VerificationTask, but VerificationTask must not produce Hypothesis). If a cycle is needed, that's a self-modifying reaction and a v2 concern.
If all four checks pass, the worked example is internally coherent and the proposal is ready for either revision or implementation. If any fails, that's the next thing to fix in the design.