yardlet 0.8.0 - Docs.rs

# Yardlet v0.8 — Loop Engineering: Decisions & Tracks

> Session record, 2026-06-22/23. Companion to the planning draft
> (`yardlet-v0.8-loop-engineering-plan.md`). Captures what was decided, what is
> still open, the evidence behind it, and current build status.

## Framing: two parallel tracks on one foundation

- **v0.8 keeps its theme**: Project Memory + Trust Report (the plan). Not re-themed.
- **Reliability fixes** surfaced from a real multi-task drain run are a **parallel
  track** (several are v0.7.x-patch-worthy). They do not hijack the v0.8 theme.
- **`finalize_run` is the shared foundation** both tracks build on, so it is built first.

```
        ┌─ Slice 1: finalize_run (shared foundation, behavior-preserving) ─┐
 v0.8 main ──► Project Memory (2-4) ──► Trust Report (5) ──► Mining (6)
 reliability ─► capability ──► review remediation ──► auto_commit/push ──► gaps + UI
```

## Decided

- **T1 — memory placement**: fold project memory into the existing harness
  discovery pass and de-duplicate against native sources (CLAUDE.md / AGENTS.md /
  .cursor / .claude-skills); use progressive disclosure for bulk (short index +
  anchors always; bodies read on demand). Reject "separate section + tolerate
  duplication." Research-backed (Claude Code memory docs, agents.md, Cursor,
  Aider, Cline Memory Bank, Anthropic Agent Skills, Hermes Agent).
- **T2 — memory git placement**: human-authored memory docs are **git-tracked**
  (shared, present in worktrees); the generated `index.yaml` is **gitignored**.
  Runtime state (runs/checkpoints/handoffs/telemetry) stays gitignored as today.
- **Review-loop autonomy = full auto**: a review that fails a criterion auto-links
  its remediation follow-up, the fix auto-runs, the review auto re-verifies, then
  proceeds. Cap = **2** remediation cycles, then park as needs_user.
- **finalize_run shape**: one shared finalization pipeline with per-path,
  behavior-preserving flags (parallel skips hooks/validation/conversation/learned;
  recovery is minimal salvage). Recovery stays minimal in Slice 1.
- **auto_commit**: a single `yardlet.yaml` boolean (like `auto_skill`). **Push is
  NOT auto** — gated via the existing approval-policy bucket (`deploy_publish_send`),
  default off / explicit. Push is outward-facing (invariant I4) and matches the
  owner's standing "push only on explicit request" stance. Note: approval today is
  a *pre-run task gate*; auto-push happens at *finalize time*, so a new finalize-time
  push hook is needed (policy vocabulary reused, enforcement point is new).

## Slices (revised after review)

- **Slice 1** — extract `finalize_run`, wire serial. **DONE** (commit `33950d1`,
  115 tests green, fmt/clippy clean). Next: wire parallel + recovery (needs the
  worktree-merge step + a queue-commit mode: serial=defensive-reload,
  parallel=in-memory-index).
- **Slice 1b** — capability handling: graceful park (no hard error) at ingestion
  AND runtime; validate/normalize `required_capabilities` against declared worker
  capabilities (alias vs warn — undecided).
- **Slice 1c** — review auto-remediation (the autonomy decision above).
- **Slice 1d** — auto_commit (`yardlet.yaml` bool) + push policy hook.
- **Slice 1e** — close the gaps the finalize map revealed: parallel runs no
  validation; recovery records no telemetry/handoff; **run.yaml is never sealed**.
  - **Run-record seal — DONE** (commit `e9620b2`): `run.yaml` was written
    `running` at spawn and never updated, so every record looked in-flight
    forever — a Trust Report or run-dir scan could not tell a finished run from a
    stranded one. `finalize_run` (shared by serial/parallel/recovery) now seals
    it to the real terminal state + a `completed_at`, preserving `started_at`.
    Surfaced by dogfooding on a live workspace (every `run.yaml` stuck at
    `running`; 200+ run dirs). This is a prerequisite for the Trust
    Report's "scan run dirs" metric-source option (Slice 5).
  - **Parallel-path validation — DONE** (commit `6f78257`): `finalize_run` now
    runs validation in the worktree (`merge.wt_path`) for a worktree run, ws.root
    for the serial in-place path — BEFORE the merge, so a parallel task that
    fails validation never reaches the workspace (stays Partial, worktree kept).
    `FinalizeFlags::parallel()` flipped `validation: true`. Test drives
    `finalize_run` with a worktree + a marker-file command that passes only from
    the worktree, proving the cwd. (Not live-dogfoodable: a parallel batch needs
    a clean tree, which the at-rest dogfood workspace lacks — covered by the unit
    test instead.)
  - **Recovery telemetry — DONE** (commit `e4ef54e`): `FinalizeFlags::recovery()`
    now emits telemetry (was off, so the trust report silently undercounted every
    salvaged task). `recover_orphans` reads the original worker from the stranded
    `run.yaml` (new `run_worker` helper) and passes it as `worker_id`, so the row
    is attributed (not "") and labeled `reason=recovery`. Artifacts/hooks stay off
    (checkpoint/handoff were the dead session's job); `wall_seconds` stays 0
    (original duration unknown). Test asserts one attributed+labeled row.
  - Still open in 1e: **finalize-time auto-push hook** (the one remaining 1d/1e
    piece: `auto_commit` lands locally; auto-push stays gated via the approval
    policy `deploy_publish_send` bucket, default off — needs a finalize-time push
    enforcement point since approval today is a pre-run task gate).
- **v0.8 main** (after foundation): Project Memory MVP (2-4), Trust Report (5),
  Outcome Mining (6).
  - **Project Memory MVP — DONE** (commit `7f55e13`): `discover_harness` now
    folds `.agents/memory/*.md` docs into the harness next to rules/skills,
    always on. Each doc → ONE index line (title + one-line summary + anchor,
    parsed from `name:`/`description:` frontmatter or first heading/prose line);
    `push_harness_sections` emits a "Project memory (read on demand)" packet
    section. Bodies are NEVER inlined (T1 progressive disclosure). New `yardlet
    memory` lists the index. Location = `.agents/memory/` (git-tracked harness
    asset, T2); generated `index.yaml` out of scope (only `.md` read). Memory is
    native to no worker → always projected (no dedup yet). Dogfood-verified
    (empty workspace → "no memory yet"; populated temp → frontmatter + heading
    docs parsed into the index).
    - **Stale detection — DONE** (commit `2526e4e`, slice 3): a memory doc may
      declare `look_at:` landmark paths (frontmatter, inline or list); `yardlet
      memory` flags it "⚠ possibly stale" when a landmark changed in git AFTER
      the doc did (committer-time compare). `HarnessMemory.look_at` + extended
      `parse_memory_doc`; staleness computed only in the command, so the packet
      hot path stays git-free. Dogfood-verified with controlled commit dates.
    - **Deferred (later 2-4 slices):** native-source dedup (T1 against
      CLAUDE.md/AGENTS.md — currently those are RULES, no memory overlap yet);
      persisted/generated `index.yaml` cache; landmark↔task mapping (match a
      task's scope paths to memory `look_at` globs); T3 out-of-repo memory; T4
      existing-repo consolidation.
  - **Trust Report MVP — DONE** (commit `5351352`): read-only `yardlet trust`
    folds run telemetry into first-pass-Done vs Done-after-retry vs never-Done,
    per-worker reliability (done-rate, partial/failed/no-result, wall, user
    overrides), and the heavy-retry task list. Pure `src/trust.rs`
    (`summarize`+`render`, 2 tests); REPORTS only, never edits policy.
    **Trust-metric source DECIDED = telemetry** (not run-dir scan): historical
    `run.yaml` predates the seal, so the append-only `telemetry/runs.jsonl` is
    the reliable cumulative source. Dogfooded on a live workspace (163 runs / 49
    tasks): one worker 73% done with 12 no-result vs the other 85% / 1 — a real
    trust differential the routing review never surfaced.
  - **Intent-scoped telemetry — DONE** (commit `e93e003`): `RunTelemetry` now
    records `intent_id` (set from `queue.intent_id` in `finalize_run`'s telemetry
    block), so a task id reused across intents no longer folds into one inflated
    attempt count. `yardlet trust` scopes to the active intent's runs (pure
    `scope_runs` helper + intent-aware `render`), falling back to the cumulative
    view — with its folding caveat — when no telemetry carries the active intent
    yet. Legacy records (no `intent_id`) degrade cleanly to cumulative; scoping
    engages automatically as new runs accrue.
  - **Outcome Mining MVP — DONE** (commit `ab3ba8a`): `trust::mine` folds
    telemetry into threshold-crossing, human-applicable observations, surfaced in
    `yardlet harness review` next to learned rules/skills. Two deterministic
    signals: a worker with a high **no-result** rate (≥10% over ≥6 runs — an
    output-contract problem, not a routing one) and a **task kind that averages
    many attempts** to Done (≥2.5 over ≥3 tasks — wants a skill or sharper
    acceptance). Distinct from routing review (selection) and the trust dashboard
    (raw stats); SUGGESTS only. Dogfooded on 163 runs → 4 observations incl.
    "kind 'review' averaged 4.2 attempts" (reviews loop the most) + a worker
    no-result hotspot. **Candidate-pipeline merge** (mined ↔ worker-proposed
    rules/skills) and **structured-validation evidence mining** remain later.

**v0.8 main is MVP-complete:** Slice 1 foundation (seal/parallel-validation/
recovery-telemetry), Project Memory (2-4 MVP), Trust Report (5), Outcome Mining
(6) have all landed. Only the deliberately-deferred finalize-time auto-push hook
(1e) and the later refinements (native dedup, stale detection, candidate merge)
remain.

## Open (not yet decided)

- **auto_commit default ON vs OFF.** Review flags: it touches the user's real git
  history (higher surprise than `.agents/`-only auto behaviors); it triggers the
  repo's commit hooks (run vs `--no-verify` — decide); commit failure must be
  **non-fatal** to the task; evidence-only staging does NOT guarantee a clean tree,
  so it is not a guaranteed parallel-enabler. Scope auto_commit to the serial
  in-place path (parallel already commits via worktree merge).
- **T3 — out-of-repo memory** (`~/.claude/CLAUDE.md`, `~/.codex`, parent CLAUDE.md):
  surface (read-only diagnostic) / ingest / ignore. Parked "needs more thought."
  Optional research: do tool-native "disable user memory" switches exist that do
  NOT break auth (HOME redirect does)?
- **T4 — existing-repo consolidation**: does memory init synthesize/dedup existing
  AGENTS.md / CLAUDE.md / .cursor, or just layer navigation on top?
- **Capability UI** (workers panel to set capabilities), **error-truncation TUI fix**,
  **parallel preflight** ("clean tree needed; N uncommitted").
- **Tier-2 per-slice**: stale detection (git-diff on landmark `look_at` paths),
  landmark↔task mapping (match task scope paths to landmark globs), ~~trust-metric
  source~~ (DONE = telemetry, intent-scoped via `RunTelemetry.intent_id`; see
  Trust Report above), mining structured-validation
  evidence, candidate pipeline merge (worker-proposed + Yardlet-mined), and
  existing-workspace migration.

## Evidence / findings

- **Memory architecture research** (cited, mostly primary): A+C is the cross-tool
  consensus; keep the always-loaded part small (Claude Code <200 lines, Cursor <500,
  Aider repo map ~1k tokens); `@`-imports do NOT save context; out-of-repo memory is
  the single most important correctness caveat for a "consistent memory" claim; the
  "AGENTS.md backed by 25+ tools" marketing claim was refuted.
- **2026 tool landscape** (see memory `agent-tool-landscape-2026`): Jan-2026 adoption
  Copilot 29 / Claude Code 18 / Cursor 18 / Codex 3; mid-2026 shift = **OpenCode** is
  the #1 OSS coding agent and the fork-base (MiMo Code forks it); no newer hard
  survey. **Hermes** and **Mimo** are model+tool naming collisions — Hermes Agent and
  Xiaomi MiMo Code are real H1-2026 peer CLIs, but their "adoption" is buzz, not
  measured. OpenCode is a stronger worker candidate than either.
- **Bugs surfaced by a live drain run** — one root cause: *worker-proposed free-form
  data is not validated/normalized at ingestion*:
  1. A review returns `partial` (not `needs_user`), so the drain auto-retries it from
     checkpoint and re-reviews unchanged code forever; and `yardlet answer <task>` is
     target-pinned (`run.rs` builds `target: Some(task)`), so answering a review never
     advances to the fix task.
  2. Capability mismatch: serial **hard-errors** on unmet `required_capabilities`;
     parallel **skips**. `norm_cap` is syntactic only (case + space/hyphen→underscore),
     so `imagegen` ≠ `image_generation`. YARD-011 is deliberately human-gated.
  3. Parallel needs a **clean git tree** (worktree base); a live run's hundreds of
     dirty asset files block it. Not a bug — needs a preflight message + truncation fix.
  4. The TUI truncates long error messages (panel width), hiding the real reason.
- **finalize_run map**: serial = full pipeline; parallel skips
  hooks/validation/conversation/learned; recovery is minimal salvage (also no
  telemetry / handoff / follow-up ingestion).

## Hardening review (2026-06-26/27) + the two problem features

After the v0.8 feature work, FOUR workflow-backed high-effort code reviews of
`origin/main..HEAD` were run (each: 8 finder angles + an independent verifier).
~38 findings were fixed across four commits (`f0fea1d`, `e2c8af9`, `3054bfb`,
plus the first batch in the earlier fix commits). Findings dropped in
correctness over the first three rounds (R1≈9, R2≈7, R3=4) — BUT the bugs kept
clustering in the SAME two "smart automation" features, and round 4 confirmed
they do not converge by patching:

- **auto-commit** (1d): flagged every round with a new hole; the root one
  (`run.rs` `git_commit_worker_changes` / `worker_touched`) is **fundamental and
  unfixable in the in-place design**: a serial run edits the SHARED tree, and a
  before/after fingerprint diff CANNOT tell the worker's changes apart from a
  concurrent user/other-session edit. The original design already flagged this
  ("evidence-only staging does NOT guarantee a clean tree") and scoped it to
  serial in-place — but never resolved it.
- **review auto-remediation** (1c): flagged every round with a new deadlock; the
  root one is the **hard `depends_on` edge** — a failed review is re-queued to
  depend on the fix, but `deps_met` only clears on Done, so any fix that fails /
  is deferred / is title-deduped strands the review forever.

**Convention check (Claude Code / Codex), to ground the fix direction:**
- Neither auto-commits an in-place working tree. In-place tools (Claude Code,
  Codex CLI) leave changes for the human to commit; isolated/cloud agents (Codex
  cloud) run each task in an isolated container/branch and THEN auto-commit + PR
  (safe BECAUSE isolated). yardlet's parallel path already does the isolated
  version (worktree + merge); only serial-in-place is the unsafe outlier.
- Neither chains a review to a fix via a blocking cross-task dependency. They
  either self-correct IN one session (Claude Code/Codex iterate in-context) or
  surface findings (PR comments) for a separate step. No `deps_met`-style edge.

**DECIDED DIRECTION (do NOT drop the autonomy — yardlet's whole point is to push
toward the goal further than the base tools; fix the fragile MECHANISM, keep the
ambition; both align with the ORIGINAL design intent):**

1. **review (1c)** — keep the bounded autonomous loop the original design already
   specified ("auto-fix, auto re-verify, cap = 2 cycles, THEN park needs_user"),
   but REMOVE the hard `depends_on` edge (the deadlock source). Instead: ingest
   the fix at higher priority so it runs first, and re-queue the review with NO
   blocking dependency; the drain's attempt cap bounds the fix+re-review loop and
   surfaces needs_user only after it is exhausted. = "try hard, then ask",
   deadlock-free. Small change.
2. **auto-commit (1d)** — the worktree IS the answer (matches Codex-cloud +
   resolves the original flagged concern). Run the serial path in an isolated
   worktree too (the parallel infra — create/merge — already exists), so
   everything in the worktree is provably the worker's and auto-commit is safe.
   Interim option if deferring: ship auto-commit ONLY for the worktree/parallel
   path for v0.8 and add serial-in-worktree next.

**Remaining targeted bugs from round 4 (fix alongside, NOT part of the two-feature
rework):** memory staleness compares repo-root-relative git paths against
ws-root-relative `look_at` (breaks in a subdirectory workspace) and does not
normalize `./`-prefixed landmarks (`cli.rs`); `cmd_defer`'s stranded-dependent
warning is direct-only while drain/status are transitive (`cli.rs:941`); the
review `runnable` filter excludes only `Blocked`, not `Deferred` (`run.rs:1963`);
telemetry `intent_id` is read after `finalize_on_latest_queue` reloads the queue
(`run.rs:1092`); `changed_paths` lossily decodes non-UTF-8 paths (subsumed by the
worktree rework). 13 lower-value cleanup/efficiency nits were refuted/deferred.

**IMPLEMENTED (2026-06-28, branch `v0.8-finalize-run`):** both reworks + the
targeted bugs landed.
- **review (1c):** `requeue_review` now drops the hard `depends_on` edge and
  soft-sequences the review behind its remediation fixes by PRIORITY (pull the
  fixes to `front - 20`, slot the review at `front - 10`). A fix that
  fails/defers/dedups simply leaves the Queued set and the review re-verifies
  anyway; the drain's per-task 2-cap bounds the loop and surfaces needs_user only
  when exhausted. The `runnable` filter now keeps `Queued`-only (excludes
  `Deferred`, targeted bug #3).
- **auto-commit (1d):** shipped the **worktree-only interim**. The unsafe serial
  in-place commit (`git_commit_worker_changes`) is **removed**; an opted-in serial
  Done run that produced worker changes now reports "auto-commit deferred …
  commit manually". The parallel path still commits via its isolated worktree +
  merge (that path is intrinsically safe and unaffected by the flag). **Full
  serial-in-worktree auto-commit is the next slice** (it needs a new execution
  path: worker in a worktree but finalize writing canonical `.agents/` to the main
  tree + commit via `integrate_worktree`, keeping the in-place gates the parallel
  path currently skips — too much new surface to rush into this release).
- **targeted bugs:** #1 memory staleness now prefixes ws-relative `look_at` with
  `git rev-parse --show-prefix` before matching the repo-root-relative porcelain
  set + strips `./` (`cli.rs`); #2 `cmd_defer` stranded warning is now a
  transitive closure mirroring drain/status; #4 telemetry `intent_id` is captured
  in `finalize_run` BEFORE `finalize_on_latest_queue` reloads the queue. #5
  `changed_paths` non-UTF-8 lossy decode is moot for the interim (no in-place
  evidence commit) and folds into the serial-in-worktree slice.

**5TH REVIEW (2026-06-28, workflow `wsj4w73mj`, 6 finder angles + adversarial
verifier per finding, 19 agents): CONVERGED — zero release blockers.** Trajectory
R1≈9 → R2≈7 → R3=4 → R4(2 root causes) → R5=**0 surviving correctness/regression**.
11 verified-real findings, ALL non-blocking (robustness/cleanup/doc/test). The two
problem features are genuinely fixed: the 1c permanent deadlock is gone, and 1d
serial never commits while the parallel worktree path is unaffected. Six residuals
were surfaced and **all fixed in the same session before tagging** (none deferred):
1. **1c ordering gap (highest value):** the `runnable` filter admitted a fix on
   `state==Queued` alone, so a reviewer-proposed fix carrying its OWN unmet
   `depends_on` was counted runnable, the dep-free review (sequenced ahead) could
   be picked by `select_next` before that still-gated fix, and re-verify unchanged
   code. Bounded (the 2-cap force-surfaces NeedsUser) but it defeated the "fixes
   run first" guarantee. FIX: `runnable_fix_ids` now requires `Queued && deps_met`,
   mirroring `select_next` eligibility exactly (extracted + unit-tested).
2. **stale gated-note (reported 4×):** after dropping the hard dep, the >2-cap
   `gated_note`'s `depends_on`-based filter can't match a soft-sequenced review;
   comment corrected (the review is Queued and re-runs next drain, not stranded).
3. **cmd_defer over-report:** the stranded closure grew over `!= Done`; the
   drain/status closure it mirrors grows over `== Queued` only — fixed so an
   already-terminal dependent isn't falsely listed as "now cannot run".
4. **telemetry intent_id second site:** `seal_run_record` still passed the
   post-reload `queue.intent_id`; now uses the captured spawn-time `intent_id`
   like the telemetry block (same bug #4 class).
5. **CHANGELOG conflation:** the auto-commit entry implied the flag drives the
   parallel commit; reworded so the always-on worktree commit and the serial
   opt-in (which defers) are not conflated, matching the schemas.rs doc.
6. **missing serial auto-commit coverage:** extracted `worker_changed_outside_agents`
   + unit test (fires the deferred guidance only on real non-`.agents/` changes).

129 tests green, fmt + clippy `-D warnings` clean.

**NEXT:** release v0.8.0 (main merge + crates.io + GitHub release w/ 3 binaries) —
outward/irreversible, needs explicit user ok.

## Status

- **Branch** `v0.8-finalize-run`, **PUSHED to origin** through `ccb813d`. v0.8
  feature scope (finalize_run foundation 1b–1e, Project Memory 2, Trust Report 5,
  Outcome Mining 6, `Deferred` state + `yardlet defer`) + version bump to 0.8.0 +
  CHANGELOG all landed. Then four hardening passes (`f0fea1d`, `e2c8af9`,
  `3054bfb`) fixed ~38 review findings, and a fifth commit (the two-feature
  rework above) plus the 5th-review residual fixes are committed locally but **not
  yet pushed**. **129 tests green** (dropped the obsolete `git_commit_worker_changes`
  test with the function; added `runnable_fix_ids`, `serial_auto_commit_guidance`,
  and the soft-priority 1c test), fmt + clippy `-D warnings` clean.
- **v0.8.0 NOT released** (no main merge / crates.io / GitHub release yet). The
  two release-gating reworks are IMPLEMENTED and the **5th review CONFIRMED
  CONVERGENCE (zero release blockers); all six non-blocking residuals were fixed
  in-session** — see the hardening-review section above. Remaining gate: the
  release itself (outward/irreversible), which needs explicit user ok.
- Dogfood loop (2026-06-23): rebuilt + installed the v0.8 binary, ran it against
  a live dogfood workspace (16 done / 1 blocked / 17 total). `status` and
  `recover` both clean — `recover` is a correct no-op (no Running/Failed tasks;
  the user-gated blocked task is untouched). The run dirs all showed `run.yaml`
  stuck at `running` → motivated the seal fix above.
- The session's live workspace state (per-task status, uncommitted deliverables) is
  recorded in the local session memory, not in this committed doc.