# Yardlet v0.8 — Loop Engineering: Decisions & Tracks
> Session record, 2026-06-22/23. Companion to the planning draft
> (`yardlet-v0.8-loop-engineering-plan.md`). Captures what was decided, what is
> still open, the evidence behind it, and current build status.
## Framing: two parallel tracks on one foundation
- **v0.8 keeps its theme**: Project Memory + Trust Report (the plan). Not re-themed.
- **Reliability fixes** surfaced from a real multi-task drain run are a **parallel
track** (several are v0.7.x-patch-worthy). They do not hijack the v0.8 theme.
- **`finalize_run` is the shared foundation** both tracks build on, so it is built first.
```
┌─ Slice 1: finalize_run (shared foundation, behavior-preserving) ─┐
v0.8 main ──► Project Memory (2-4) ──► Trust Report (5) ──► Mining (6)
reliability ─► capability ──► review remediation ──► auto_commit/push ──► gaps + UI
```
## Decided
- **T1 — memory placement**: fold project memory into the existing harness
discovery pass and de-duplicate against native sources (CLAUDE.md / AGENTS.md /
.cursor / .claude-skills); use progressive disclosure for bulk (short index +
anchors always; bodies read on demand). Reject "separate section + tolerate
duplication." Research-backed (Claude Code memory docs, agents.md, Cursor,
Aider, Cline Memory Bank, Anthropic Agent Skills, Hermes Agent).
- **T2 — memory git placement**: human-authored memory docs are **git-tracked**
(shared, present in worktrees); the generated `index.yaml` is **gitignored**.
Runtime state (runs/checkpoints/handoffs/telemetry) stays gitignored as today.
- **Review-loop autonomy = full auto**: a review that fails a criterion auto-links
its remediation follow-up, the fix auto-runs, the review auto re-verifies, then
proceeds. Cap = **2** remediation cycles, then park as needs_user.
- **finalize_run shape**: one shared finalization pipeline with per-path,
behavior-preserving flags (parallel skips hooks/validation/conversation/learned;
recovery is minimal salvage). Recovery stays minimal in Slice 1.
- **auto_commit**: a single `yardlet.yaml` boolean (like `auto_skill`). **Push is
NOT auto** — gated via the existing approval-policy bucket (`deploy_publish_send`),
default off / explicit. Push is outward-facing (invariant I4) and matches the
owner's standing "push only on explicit request" stance. Note: approval today is
a *pre-run task gate*; auto-push happens at *finalize time*, so a new finalize-time
push hook is needed (policy vocabulary reused, enforcement point is new).
## Slices (revised after review)
- **Slice 1** — extract `finalize_run`, wire serial. **DONE** (commit `33950d1`,
115 tests green, fmt/clippy clean). Next: wire parallel + recovery (needs the
worktree-merge step + a queue-commit mode: serial=defensive-reload,
parallel=in-memory-index).
- **Slice 1b** — capability handling: graceful park (no hard error) at ingestion
AND runtime; validate/normalize `required_capabilities` against declared worker
capabilities (alias vs warn — undecided).
- **Slice 1c** — review auto-remediation (the autonomy decision above).
- **Slice 1d** — auto_commit (`yardlet.yaml` bool) + push policy hook.
- **Slice 1e** — close the gaps the finalize map revealed: parallel runs no
validation; recovery records no telemetry/handoff; **run.yaml is never sealed**.
- **Run-record seal — DONE** (commit `e9620b2`): `run.yaml` was written
`running` at spawn and never updated, so every record looked in-flight
forever — a Trust Report or run-dir scan could not tell a finished run from a
stranded one. `finalize_run` (shared by serial/parallel/recovery) now seals
it to the real terminal state + a `completed_at`, preserving `started_at`.
Surfaced by dogfooding on a live workspace (every `run.yaml` stuck at
`running`; 200+ run dirs). This is a prerequisite for the Trust
Report's "scan run dirs" metric-source option (Slice 5).
- **Parallel-path validation — DONE** (commit `6f78257`): `finalize_run` now
runs validation in the worktree (`merge.wt_path`) for a worktree run, ws.root
for the serial in-place path — BEFORE the merge, so a parallel task that
fails validation never reaches the workspace (stays Partial, worktree kept).
`FinalizeFlags::parallel()` flipped `validation: true`. Test drives
`finalize_run` with a worktree + a marker-file command that passes only from
the worktree, proving the cwd. (Not live-dogfoodable: a parallel batch needs
a clean tree, which the at-rest dogfood workspace lacks — covered by the unit
test instead.)
- **Recovery telemetry — DONE** (commit `e4ef54e`): `FinalizeFlags::recovery()`
now emits telemetry (was off, so the trust report silently undercounted every
salvaged task). `recover_orphans` reads the original worker from the stranded
`run.yaml` (new `run_worker` helper) and passes it as `worker_id`, so the row
is attributed (not "") and labeled `reason=recovery`. Artifacts/hooks stay off
(checkpoint/handoff were the dead session's job); `wall_seconds` stays 0
(original duration unknown). Test asserts one attributed+labeled row.
- Still open in 1e: **finalize-time auto-push hook** (the one remaining 1d/1e
piece: `auto_commit` lands locally; auto-push stays gated via the approval
policy `deploy_publish_send` bucket, default off — needs a finalize-time push
enforcement point since approval today is a pre-run task gate).
- **v0.8 main** (after foundation): Project Memory MVP (2-4), Trust Report (5),
Outcome Mining (6).
- **Project Memory MVP — DONE** (commit `7f55e13`): `discover_harness` now
folds `.agents/memory/*.md` docs into the harness next to rules/skills,
always on. Each doc → ONE index line (title + one-line summary + anchor,
parsed from `name:`/`description:` frontmatter or first heading/prose line);
`push_harness_sections` emits a "Project memory (read on demand)" packet
section. Bodies are NEVER inlined (T1 progressive disclosure). New `yardlet
memory` lists the index. Location = `.agents/memory/` (git-tracked harness
asset, T2); generated `index.yaml` out of scope (only `.md` read). Memory is
native to no worker → always projected (no dedup yet). Dogfood-verified
(empty workspace → "no memory yet"; populated temp → frontmatter + heading
docs parsed into the index).
- **Stale detection — DONE** (commit `2526e4e`, slice 3): a memory doc may
declare `look_at:` landmark paths (frontmatter, inline or list); `yardlet
memory` flags it "⚠ possibly stale" when a landmark changed in git AFTER
the doc did (committer-time compare). `HarnessMemory.look_at` + extended
`parse_memory_doc`; staleness computed only in the command, so the packet
hot path stays git-free. Dogfood-verified with controlled commit dates.
- **Deferred (later 2-4 slices):** native-source dedup (T1 against
CLAUDE.md/AGENTS.md — currently those are RULES, no memory overlap yet);
persisted/generated `index.yaml` cache; landmark↔task mapping (match a
task's scope paths to memory `look_at` globs); T3 out-of-repo memory; T4
existing-repo consolidation.
- **Trust Report MVP — DONE** (commit `5351352`): read-only `yardlet trust`
folds run telemetry into first-pass-Done vs Done-after-retry vs never-Done,
per-worker reliability (done-rate, partial/failed/no-result, wall, user
overrides), and the heavy-retry task list. Pure `src/trust.rs`
(`summarize`+`render`, 2 tests); REPORTS only, never edits policy.
**Trust-metric source DECIDED = telemetry** (not run-dir scan): historical
`run.yaml` predates the seal, so the append-only `telemetry/runs.jsonl` is
the reliable cumulative source. Dogfooded on a live workspace (163 runs / 49
tasks): one worker 73% done with 12 no-result vs the other 85% / 1 — a real
trust differential the routing review never surfaced.
- **Intent-scoped telemetry — DONE** (commit `e93e003`): `RunTelemetry` now
records `intent_id` (set from `queue.intent_id` in `finalize_run`'s telemetry
block), so a task id reused across intents no longer folds into one inflated
attempt count. `yardlet trust` scopes to the active intent's runs (pure
`scope_runs` helper + intent-aware `render`), falling back to the cumulative
view — with its folding caveat — when no telemetry carries the active intent
yet. Legacy records (no `intent_id`) degrade cleanly to cumulative; scoping
engages automatically as new runs accrue.
- **Outcome Mining MVP — DONE** (commit `ab3ba8a`): `trust::mine` folds
telemetry into threshold-crossing, human-applicable observations, surfaced in
`yardlet harness review` next to learned rules/skills. Two deterministic
signals: a worker with a high **no-result** rate (≥10% over ≥6 runs — an
output-contract problem, not a routing one) and a **task kind that averages
many attempts** to Done (≥2.5 over ≥3 tasks — wants a skill or sharper
acceptance). Distinct from routing review (selection) and the trust dashboard
(raw stats); SUGGESTS only. Dogfooded on 163 runs → 4 observations incl.
"kind 'review' averaged 4.2 attempts" (reviews loop the most) + a worker
no-result hotspot. **Candidate-pipeline merge** (mined ↔ worker-proposed
rules/skills) and **structured-validation evidence mining** remain later.
**v0.8 main is MVP-complete:** Slice 1 foundation (seal/parallel-validation/
recovery-telemetry), Project Memory (2-4 MVP), Trust Report (5), Outcome Mining
(6) have all landed. Only the deliberately-deferred finalize-time auto-push hook
(1e) and the later refinements (native dedup, stale detection, candidate merge)
remain.
## Open (not yet decided)
- **auto_commit default ON vs OFF.** Review flags: it touches the user's real git
history (higher surprise than `.agents/`-only auto behaviors); it triggers the
repo's commit hooks (run vs `--no-verify` — decide); commit failure must be
**non-fatal** to the task; evidence-only staging does NOT guarantee a clean tree,
so it is not a guaranteed parallel-enabler. Scope auto_commit to the serial
in-place path (parallel already commits via worktree merge).
- **T3 — out-of-repo memory** (`~/.claude/CLAUDE.md`, `~/.codex`, parent CLAUDE.md):
surface (read-only diagnostic) / ingest / ignore. Parked "needs more thought."
Optional research: do tool-native "disable user memory" switches exist that do
NOT break auth (HOME redirect does)?
- **T4 — existing-repo consolidation**: does memory init synthesize/dedup existing
AGENTS.md / CLAUDE.md / .cursor, or just layer navigation on top?
- **Capability UI** (workers panel to set capabilities), **error-truncation TUI fix**,
**parallel preflight** ("clean tree needed; N uncommitted").
- **Tier-2 per-slice**: stale detection (git-diff on landmark `look_at` paths),
landmark↔task mapping (match task scope paths to landmark globs), ~~trust-metric
source~~ (DONE = telemetry, intent-scoped via `RunTelemetry.intent_id`; see
Trust Report above), mining structured-validation
evidence, candidate pipeline merge (worker-proposed + Yardlet-mined), and
existing-workspace migration.
## Evidence / findings
- **Memory architecture research** (cited, mostly primary): A+C is the cross-tool
consensus; keep the always-loaded part small (Claude Code <200 lines, Cursor <500,
Aider repo map ~1k tokens); `@`-imports do NOT save context; out-of-repo memory is
the single most important correctness caveat for a "consistent memory" claim; the
"AGENTS.md backed by 25+ tools" marketing claim was refuted.
- **2026 tool landscape** (see memory `agent-tool-landscape-2026`): Jan-2026 adoption
Copilot 29 / Claude Code 18 / Cursor 18 / Codex 3; mid-2026 shift = **OpenCode** is
the #1 OSS coding agent and the fork-base (MiMo Code forks it); no newer hard
survey. **Hermes** and **Mimo** are model+tool naming collisions — Hermes Agent and
Xiaomi MiMo Code are real H1-2026 peer CLIs, but their "adoption" is buzz, not
measured. OpenCode is a stronger worker candidate than either.
- **Bugs surfaced by a live drain run** — one root cause: *worker-proposed free-form
data is not validated/normalized at ingestion*:
1. A review returns `partial` (not `needs_user`), so the drain auto-retries it from
checkpoint and re-reviews unchanged code forever; and `yardlet answer <task>` is
target-pinned (`run.rs` builds `target: Some(task)`), so answering a review never
advances to the fix task.
2. Capability mismatch: serial **hard-errors** on unmet `required_capabilities`;
parallel **skips**. `norm_cap` is syntactic only (case + space/hyphen→underscore),
so `imagegen` ≠ `image_generation`. YARD-011 is deliberately human-gated.
3. Parallel needs a **clean git tree** (worktree base); a live run's hundreds of
dirty asset files block it. Not a bug — needs a preflight message + truncation fix.
4. The TUI truncates long error messages (panel width), hiding the real reason.
- **finalize_run map**: serial = full pipeline; parallel skips
hooks/validation/conversation/learned; recovery is minimal salvage (also no
telemetry / handoff / follow-up ingestion).
## Hardening review (2026-06-26/27) + the two problem features
After the v0.8 feature work, FOUR workflow-backed high-effort code reviews of
`origin/main..HEAD` were run (each: 8 finder angles + an independent verifier).
~38 findings were fixed across four commits (`f0fea1d`, `e2c8af9`, `3054bfb`,
plus the first batch in the earlier fix commits). Findings dropped in
correctness over the first three rounds (R1≈9, R2≈7, R3=4) — BUT the bugs kept
clustering in the SAME two "smart automation" features, and round 4 confirmed
they do not converge by patching:
- **auto-commit** (1d): flagged every round with a new hole; the root one
(`run.rs` `git_commit_worker_changes` / `worker_touched`) is **fundamental and
unfixable in the in-place design**: a serial run edits the SHARED tree, and a
before/after fingerprint diff CANNOT tell the worker's changes apart from a
concurrent user/other-session edit. The original design already flagged this
("evidence-only staging does NOT guarantee a clean tree") and scoped it to
serial in-place — but never resolved it.
- **review auto-remediation** (1c): flagged every round with a new deadlock; the
root one is the **hard `depends_on` edge** — a failed review is re-queued to
depend on the fix, but `deps_met` only clears on Done, so any fix that fails /
is deferred / is title-deduped strands the review forever.
**Convention check (Claude Code / Codex), to ground the fix direction:**
- Neither auto-commits an in-place working tree. In-place tools (Claude Code,
Codex CLI) leave changes for the human to commit; isolated/cloud agents (Codex
cloud) run each task in an isolated container/branch and THEN auto-commit + PR
(safe BECAUSE isolated). yardlet's parallel path already does the isolated
version (worktree + merge); only serial-in-place is the unsafe outlier.
- Neither chains a review to a fix via a blocking cross-task dependency. They
either self-correct IN one session (Claude Code/Codex iterate in-context) or
surface findings (PR comments) for a separate step. No `deps_met`-style edge.
**DECIDED DIRECTION (do NOT drop the autonomy — yardlet's whole point is to push
toward the goal further than the base tools; fix the fragile MECHANISM, keep the
ambition; both align with the ORIGINAL design intent):**
1. **review (1c)** — keep the bounded autonomous loop the original design already
specified ("auto-fix, auto re-verify, cap = 2 cycles, THEN park needs_user"),
but REMOVE the hard `depends_on` edge (the deadlock source). Instead: ingest
the fix at higher priority so it runs first, and re-queue the review with NO
blocking dependency; the drain's attempt cap bounds the fix+re-review loop and
surfaces needs_user only after it is exhausted. = "try hard, then ask",
deadlock-free. Small change.
2. **auto-commit (1d)** — the worktree IS the answer (matches Codex-cloud +
resolves the original flagged concern). Run the serial path in an isolated
worktree too (the parallel infra — create/merge — already exists), so
everything in the worktree is provably the worker's and auto-commit is safe.
Interim option if deferring: ship auto-commit ONLY for the worktree/parallel
path for v0.8 and add serial-in-worktree next.
**Remaining targeted bugs from round 4 (fix alongside, NOT part of the two-feature
rework):** memory staleness compares repo-root-relative git paths against
ws-root-relative `look_at` (breaks in a subdirectory workspace) and does not
normalize `./`-prefixed landmarks (`cli.rs`); `cmd_defer`'s stranded-dependent
warning is direct-only while drain/status are transitive (`cli.rs:941`); the
review `runnable` filter excludes only `Blocked`, not `Deferred` (`run.rs:1963`);
telemetry `intent_id` is read after `finalize_on_latest_queue` reloads the queue
(`run.rs:1092`); `changed_paths` lossily decodes non-UTF-8 paths (subsumed by the
worktree rework). 13 lower-value cleanup/efficiency nits were refuted/deferred.
**IMPLEMENTED (2026-06-28, branch `v0.8-finalize-run`):** both reworks + the
targeted bugs landed.
- **review (1c):** `requeue_review` now drops the hard `depends_on` edge and
soft-sequences the review behind its remediation fixes by PRIORITY (pull the
fixes to `front - 20`, slot the review at `front - 10`). A fix that
fails/defers/dedups simply leaves the Queued set and the review re-verifies
anyway; the drain's per-task 2-cap bounds the loop and surfaces needs_user only
when exhausted. The `runnable` filter now keeps `Queued`-only (excludes
`Deferred`, targeted bug #3).
- **auto-commit (1d):** shipped the **worktree-only interim**. The unsafe serial
in-place commit (`git_commit_worker_changes`) is **removed**; an opted-in serial
Done run that produced worker changes now reports "auto-commit deferred …
commit manually". The parallel path still commits via its isolated worktree +
merge (that path is intrinsically safe and unaffected by the flag). **Full
serial-in-worktree auto-commit is the next slice** (it needs a new execution
path: worker in a worktree but finalize writing canonical `.agents/` to the main
tree + commit via `integrate_worktree`, keeping the in-place gates the parallel
path currently skips — too much new surface to rush into this release).
- **targeted bugs:** #1 memory staleness now prefixes ws-relative `look_at` with
`git rev-parse --show-prefix` before matching the repo-root-relative porcelain
set + strips `./` (`cli.rs`); #2 `cmd_defer` stranded warning is now a
transitive closure mirroring drain/status; #4 telemetry `intent_id` is captured
in `finalize_run` BEFORE `finalize_on_latest_queue` reloads the queue. #5
`changed_paths` non-UTF-8 lossy decode is moot for the interim (no in-place
evidence commit) and folds into the serial-in-worktree slice.
**5TH REVIEW (2026-06-28, workflow `wsj4w73mj`, 6 finder angles + adversarial
verifier per finding, 19 agents): CONVERGED — zero release blockers.** Trajectory
R1≈9 → R2≈7 → R3=4 → R4(2 root causes) → R5=**0 surviving correctness/regression**.
11 verified-real findings, ALL non-blocking (robustness/cleanup/doc/test). The two
problem features are genuinely fixed: the 1c permanent deadlock is gone, and 1d
serial never commits while the parallel worktree path is unaffected. Six residuals
were surfaced and **all fixed in the same session before tagging** (none deferred):
1. **1c ordering gap (highest value):** the `runnable` filter admitted a fix on
`state==Queued` alone, so a reviewer-proposed fix carrying its OWN unmet
`depends_on` was counted runnable, the dep-free review (sequenced ahead) could
be picked by `select_next` before that still-gated fix, and re-verify unchanged
code. Bounded (the 2-cap force-surfaces NeedsUser) but it defeated the "fixes
run first" guarantee. FIX: `runnable_fix_ids` now requires `Queued && deps_met`,
mirroring `select_next` eligibility exactly (extracted + unit-tested).
2. **stale gated-note (reported 4×):** after dropping the hard dep, the >2-cap
`gated_note`'s `depends_on`-based filter can't match a soft-sequenced review;
comment corrected (the review is Queued and re-runs next drain, not stranded).
3. **cmd_defer over-report:** the stranded closure grew over `!= Done`; the
drain/status closure it mirrors grows over `== Queued` only — fixed so an
already-terminal dependent isn't falsely listed as "now cannot run".
4. **telemetry intent_id second site:** `seal_run_record` still passed the
post-reload `queue.intent_id`; now uses the captured spawn-time `intent_id`
like the telemetry block (same bug #4 class).
5. **CHANGELOG conflation:** the auto-commit entry implied the flag drives the
parallel commit; reworded so the always-on worktree commit and the serial
opt-in (which defers) are not conflated, matching the schemas.rs doc.
6. **missing serial auto-commit coverage:** extracted `worker_changed_outside_agents`
+ unit test (fires the deferred guidance only on real non-`.agents/` changes).
129 tests green, fmt + clippy `-D warnings` clean.
**NEXT:** release v0.8.0 (main merge + crates.io + GitHub release w/ 3 binaries) —
outward/irreversible, needs explicit user ok.
## Status
- **Branch** `v0.8-finalize-run`, **PUSHED to origin** through `ccb813d`. v0.8
feature scope (finalize_run foundation 1b–1e, Project Memory 2, Trust Report 5,
Outcome Mining 6, `Deferred` state + `yardlet defer`) + version bump to 0.8.0 +
CHANGELOG all landed. Then four hardening passes (`f0fea1d`, `e2c8af9`,
`3054bfb`) fixed ~38 review findings, and a fifth commit (the two-feature
rework above) plus the 5th-review residual fixes are committed locally but **not
yet pushed**. **129 tests green** (dropped the obsolete `git_commit_worker_changes`
test with the function; added `runnable_fix_ids`, `serial_auto_commit_guidance`,
and the soft-priority 1c test), fmt + clippy `-D warnings` clean.
- **v0.8.0 NOT released** (no main merge / crates.io / GitHub release yet). The
two release-gating reworks are IMPLEMENTED and the **5th review CONFIRMED
CONVERGENCE (zero release blockers); all six non-blocking residuals were fixed
in-session** — see the hardening-review section above. Remaining gate: the
release itself (outward/irreversible), which needs explicit user ok.
- Dogfood loop (2026-06-23): rebuilt + installed the v0.8 binary, ran it against
a live dogfood workspace (16 done / 1 blocked / 17 total). `status` and
`recover` both clean — `recover` is a correct no-op (no Running/Failed tasks;
the user-gated blocked task is untouched). The run dirs all showed `run.yaml`
stuck at `running` → motivated the seal fix above.
- The session's live workspace state (per-task status, uncommitted deliverables) is
recorded in the local session memory, not in this committed doc.