yardlet 0.5.2 - Docs.rs

# Yardlet — Local AI Workbench Final Build Plan

> Status: final working plan for implementation  
> Target repo path: `docs/designs/local-autonomous-agent-operating-layer.md`  
> Product name: **Yardlet**  
> Primary interface: **local terminal UI / operating console**  
> Canonical repo state: `.agents/`  
> Worker engines: subscription-backed Codex CLI / Claude Code CLI, hidden behind Yardlet  
> Hard constraint: Yardlet itself must not require, request, store, or call AI provider API keys

---

## 0. One-sentence definition

**Yardlet is a local AI workbench that lets a user describe work in a few natural-language sentences, then manages planning, queued execution, worker routing, validation, compacting, handoff, and safety inside the local workspace while using Codex CLI and Claude Code CLI as hidden subscription-backed workers.**

Shorter:

> Yardlet is the local operating console where AI coding workers plan, build, verify, and hand off long-running work inside your workspace.

The user should normally open **Yardlet**, not Codex or Claude Code directly.

```txt
User
  -> Yardlet UI
    -> planning gate
    -> intent / scope / acceptance contract
    -> queue / state / ledger
    -> worker packet compiler
      -> Codex CLI or Claude Code CLI as hidden worker
    -> validation / evaluation
    -> checkpoint / handoff
```

---

## 1. Final product stance

### 1.1 Yardlet is UI-first

Yardlet is not primarily a command collection. It should feel like a local workbench opened from the terminal.

The normal entrypoint is:

```bash
yard
```

That opens a terminal UI where the user can:

- create or refine a work request;
- inspect the current intent, scope, and acceptance criteria;
- see the queue;
- start the next bounded worker run;
- pause/stop work;
- approve or deny gated actions;
- inspect run evidence;
- read or export a handoff;
- see worker readiness and billing-safety status.

CLI commands still exist, but they are mostly for automation, scripting, debugging, and the UI implementation itself.

```bash
yardlet init
yardlet status --json
yardlet worker status
yardlet inspect repo --json
yardlet packet --task YARD-001 --worker codex --dry-run
yardlet run --next --headless
```

A teammate should not need to learn Codex CLI flags, Claude Code CLI flags, or host-specific prompt tricks. Yardlet owns that.

### 1.2 Yardlet replaces the user-facing Codex/Claude Code workflow, not the worker engines yet

Yardlet should replace the day-to-day user experience of manually running:

```bash
codex
codex exec ...
claude
claude -p ...
```

But Yardlet does **not** immediately replace Codex CLI and Claude Code as internal engines. Those tools remain valuable hidden workers, especially because API-based usage can be more expensive than subscription-backed usage.

The relationship is:

```txt
Yardlet = controller, workbench, scheduler, guard, state owner
Codex CLI = hidden implementation/review worker
Claude Code CLI = hidden planning/review/implementation worker
```

Yardlet should treat workers as interchangeable engines behind a contract:

```txt
Task contract in
  -> host-specific packet
  -> worker subprocess
  -> structured result files out
  -> Yardlet evaluator decides next state
```

### 1.3 Yardlet must not require AI API keys

This is a product rule, not just an implementation preference.

Yardlet core must not:

- require an OpenAI, Anthropic, or other AI provider API key;
- ask the user to paste an AI API key;
- store AI provider API keys;
- silently fall back to an API/SDK call;
- call OpenAI/Anthropic AI APIs directly;
- treat missing API keys as a setup failure.

Yardlet may call already-installed local worker CLIs after guard checks:

```txt
Codex CLI, already logged in through a subscription-backed account
Claude Code CLI, already logged in through a Pro/Max-style subscription-backed account
```

If no safe local worker is available, Yardlet must stop with a clear local-worker readiness message. It must not ask for an API key.

### 1.4 Yardlet is standalone, not Orcar Brain local mode

Yardlet can absorb operating patterns from Orcar Brain, but it must not depend on Brain, Slack, Orcar membership, Orcar-specific Work Packages, or internal team context.

```txt
Orcar Brain
  = team agent, Slack-centered, organization/context heavy

Yardlet
  = local workbench, repo/workspace-centered, standalone, usable by non-Orcar users
```

Brain may later trigger Yardlet or consume Yardlet handoffs, but that is an optional bridge.

### 1.5 Build the final shape now

Do not build a throwaway wrapper and later hope it becomes a workbench.

The first implementation should include the final architecture’s essential surfaces, even if each component is shallow:

- terminal UI shell;
- repo initialization;
- planning gate;
- intent/scope/acceptance contract;
- structured queue;
- worker profiles;
- worker packet compiler;
- zero-key worker guard;
- run ledger;
- result schema;
- evaluator;
- compact checkpoint;
- handoff;
- tool, approval, interaction, research, and billing policies.

The point is not to make every algorithm perfect immediately. The point is to avoid the wrong shape.

---

## 2. What Yardlet is for

Yardlet should support three main situations.

### 2.1 Existing repo work

A user opens Yardlet in an existing repo and enters a short request:

```txt
Add admin order search with status, email, and date filters.
```

Yardlet should:

1. inspect the repo enough to understand the local environment;
2. run a planning gate through a safe local worker;
3. produce an intent contract and initial queue;
4. route bounded tasks to Codex/Claude workers;
5. validate locally;
6. record what happened;
7. compact context;
8. show the next step or handoff in the UI.

### 2.2 New idea to local project

A user opens Yardlet in an empty or new workspace and enters an idea:

```txt
I want a small local app for searching and summarizing team knowledge docs.
```

Yardlet should not immediately create a giant app from vibes. It should:

1. compress the idea into product intent, boundaries, and success criteria;
2. ask at most a small number of high-level natural-language questions if the idea is under-specified;
3. create a starter project plan;
4. seed a queue;
5. run bounded creation tasks;
6. keep the user in the workbench loop.

### 2.3 Team-consistent AI work

A team can adopt Yardlet so that AI-assisted work produces consistent records:

```txt
.agents/intent-contract.yaml
.agents/work-queue.yaml
.agents/runs/<run-id>/result.json
.agents/runs/<run-id>/validation.log
.agents/runs/<run-id>/checkpoint.md
.agents/runs/<run-id>/handoff.md
```

This turns “I asked an agent and it changed some files” into an auditable work unit that a teammate can inspect and resume.

---

## 3. Goals and non-goals

### 3.1 Goals

1. **Minimal input, long local work**  
   A user should be able to give a short natural-language request and let Yardlet carry the work through planning, execution, validation, compacting, and handoff.

2. **UI-first local workbench**  
   Yardlet should provide a practical terminal UI that becomes the normal place to manage local agent work.

3. **Subscription-backed worker usage**  
   Yardlet should use local Codex/Claude Code workers through their existing subscription-backed sessions and prevent accidental AI API billing.

4. **Worker-specific optimization**  
   Codex and Claude Code should receive different task packets. Yardlet compiles the same canonical task contract into worker-specific instructions.

5. **Intent and scope stability**  
   Intent lock is not the product identity, but it is a core safety rule. Worker freedom must stay inside the user’s intended work.

6. **Low user interruption**  
   Yardlet should avoid asking users for code review, architecture review, diff review, or low-level implementation choices. When it asks, questions should be few and natural-language product/scope/approval questions.

7. **Local evidence freedom**  
   Yardlet should freely gather local evidence inside policy: repo inspection, tests, read-only dev/test/fixture DB queries, browser/devtools, emulator/simulator, and sandboxed computer-use.

8. **Intent-locked research**  
   Research is allowed when needed, but research is evidence gathering, not permission to change the product goal.

9. **Token/context economy**  
   Long-running work must not mean stuffing everything into context. Yardlet should rely on anchors, progressive disclosure, compact checkpoints, result schemas, and durable state.

10. **Auditable handoff**  
    Every bounded run should leave enough evidence for a teammate or future agent to understand what happened and resume safely.

11. **Pattern absorption**  
    Yardlet should absorb useful patterns from Hermes, Ouroboros, oh-my/OMC, KAIROS, and Orcar Brain without becoming dependent on those systems.

### 3.2 Non-goals

Yardlet should not be:

- a hosted service;
- a Slack bot;
- Orcar Brain running locally;
- a mandatory AI API wrapper;
- a tool that asks the user to paste AI provider API keys;
- a skill marketplace;
- a 24-hour unbounded daemon;
- a self-modifying skill system without review;
- a generic chat app;
- a thin alias over `codex` and `claude`.

---

## 4. Core design principles

### 4.1 The workbench owns state

Codex and Claude Code are workers. They do not own canonical task state.

Canonical state lives under `.agents/` in the repo/workspace:

```txt
.agents/
  yardlet.yaml
  intent-contract.yaml
  work-queue.yaml
  tool-policy.yaml
  approval-policy.yaml
  interaction-policy.yaml
  research-policy.yaml
  billing-policy.yaml
  workers.yaml
  runs/
  checkpoints/
  handoffs/
```

Yardlet state must be durable and readable without previous chat context.

### 4.2 UI is the normal interface; CLI is a supporting surface

The primary user workflow happens inside the Yardlet terminal UI.

Commands should exist, but the user should rarely need to manually chain them.

```txt
Normal:
  yard -> UI -> New Work / Queue / Run / Handoff screens

Advanced:
  yardlet run --next --headless
  yardlet status --json
  yardlet packet --dry-run
```

### 4.3 Yardlet is deterministic until it invokes a worker

Before worker invocation, Yardlet should behave like a local controller, not like a hidden AI.

It may:

- read `.agents` files;
- inspect repo metadata;
- inspect git status;
- detect package managers and test commands;
- check worker readiness;
- enforce policies;
- select eligible tasks;
- compile packets;
- show dry-run packets;
- create run directories.

It must not:

- call AI provider APIs;
- do hidden AI reasoning through an SDK;
- ask for AI API keys;
- silently fall back to a provider endpoint.

Any AI reasoning step, including planning-gate work, must be executed through an explicitly selected local worker after the same guard checks.

### 4.4 Tool freedom is separate from intent freedom

Workers may use local tools freely inside the policy, but they may not redefine the job.

Allowed by default inside a suitable sandbox:

- repo inspection;
- local search;
- local tests and linters;
- read-only dev/test/fixture DB queries;
- local browser/devtools;
- emulator/simulator;
- screenshots of local UI;
- intent-locked research.

Gated or blocked:

- production DB access;
- writes to shared or remote systems;
- deploy/publish/send;
- purchases/account changes;
- credential extraction;
- destructive file operations outside workspace;
- scope expansion.

### 4.5 Ask fewer, higher-level questions

Yardlet should not ask users to review code, architecture, or diffs as the normal path.

Allowed user questions:

- product intent;
- scope boundary;
- acceptance priority;
- high-risk approval;
- blocked external dependency;
- whether to stop, continue, or split work after a genuine blocker.

Disallowed habitual questions:

- “Should I edit this file?”
- “Is this architecture okay?”
- “Can you review this diff?”
- “Which exact internal helper should I use?”
- “Should I run the next obvious test?”

### 4.6 Compact is a first-class operation

Yardlet must not rely on chat history as memory.

At task/cycle boundaries it should compact into durable artifacts:

```txt
checkpoint.md
result.json
validation.log
handoff.md
work-queue.yaml state update
```

Next runs should start from compact capsules and file anchors, not the previous long conversation.

### 4.7 Research is allowed but must be intent-locked

Research is an evidence tool, not a product manager.

Each research event should record:

```yaml
research_question: "..."
source_or_anchor: "..."
used_for: "..."
decision_impact: "..."
scope_impact: none | planning_update_required | approval_required
drift_detected: false
```

If research suggests a valuable adjacent idea, Yardlet should record it as a candidate queue item or note, not silently include it in the current task.

### 4.8 Zero-key AI policy

Yardlet core requires zero AI API keys.

Worker subprocesses should run in a sanitized AI-billing environment. Yardlet should scrub or block provider-billing environment variables for worker invocation, depending on configured strictness.

Important nuance: a repository may contain API keys for its own tests or application runtime. Yardlet must not read, display, or store secret values. The worker guard is about preventing **Yardlet’s AI worker execution** from using provider API billing by accident. Repo validation that calls external APIs should be separately governed by tool policy and approval policy.

---

## 5. UX: Yardlet terminal UI

### 5.1 Default launch

```bash
yard
```

Expected first screen:

```txt
┌ Yardlet ─ Local AI Workbench ─────────────────────────────────────┐
│ Repo: acme-storefront                         Workers: 2 ready │
│ Intent: Admin order search                                     │
│ Scope: admin orders UI, order API, local tests                 │
│ Status: 1 running, 3 queued, 0 blocked                         │
├ Queue ─────────────────────────────────────────────────────────┤
│ ✓ YARD-001 Inspect current order flow               claude     │
│ ▶ YARD-002 Implement search API                    codex      │
│ · YARD-003 Add UI filters                           codex      │
│ · YARD-004 Validate and hand off                    claude     │
├ Run ───────────────────────────────────────────────────────────┤
│ Worker: codex                                                   │
│ Validation: pending                                             │
│ Drift: none                                                     │
│ Approval: none required                                         │
├ Actions ────────────────────────────────────────────────────────┤
│ n new work   r run next   p pause   a approvals   h handoff     │
│ q quit       ? help       d details  w workers     s settings    │
└─────────────────────────────────────────────────────────────────┘
```

### 5.2 Main screens

#### Home

Shows the current workspace state:

- active intent;
- queue summary;
- current run;
- worker readiness;
- pending approvals;
- latest validation status;
- latest handoff.

#### New Work

A small natural-language input area.

The UI should encourage concise input:

```txt
What do you want Yardlet to work on?
[ Improve admin order search with status/email/date filters. ]
```

Optional fields:

- must include;
- must not touch;
- preferred validation;
- risk tolerance.

#### Planning Gate

Shows a human-readable product/scope contract, not implementation internals.

The user can accept, edit in natural language, or ask Yardlet to refine.

Example:

```txt
Goal
  Admin users can search orders by status, customer email, and date range.

Allowed scope
  - Admin order list UI
  - Order search API/query layer
  - Local tests and fixtures

Out of scope
  - Payment logic
  - Auth/role redesign
  - Production DB changes

Acceptance
  - UI exposes the filters
  - API handles filter params correctly
  - Existing order list behavior still works
  - Listed validation commands pass
```

#### Queue

Shows work items and lets the user reorder, pause, block, or split at a high level.

The UI should not expose too much internal YAML by default.

#### Run Monitor

Shows one bounded worker run:

- selected worker;
- task packet summary;
- live event stream;
- changed files;
- validation commands;
- drift/approval warnings;
- result status.

Raw worker logs should be available, but collapsed by default.

#### Approvals

Shows only gated actions:

- production-like access;
- destructive command;
- network mutation;
- deploy/publish/send;
- scope expansion;
- real external API call for validation;
- secret access.

The user can approve once, deny, or convert the action into a new queue item.

#### Handoff

Shows the compact summary:

- completed work;
- acceptance status;
- validation evidence;
- changed files;
- remaining blockers;
- next recommended slice;
- must-read anchors.

### 5.3 UI implementation choice

Initial implementation should be a terminal UI, not a web dashboard.

Good properties:

- runs in the same repo terminal;
- works over SSH;
- simple to ship open source;
- easy to pair with worker subprocesses;
- naturally local-first.

The UI must not become the canonical state store. It reads and writes through Yardlet’s state layer.

---

## 6. Architecture

### 6.1 High-level layers

```txt
Yardlet UI
  -> state/service layer
    -> planning gate
    -> queue manager
    -> policy engine
    -> worker router
    -> packet compiler
    -> worker runner
    -> evaluator
    -> compact/handoff writer
      -> .agents files
      -> Codex/Claude worker subprocesses
```

### 6.2 Repo-local state

```txt
.agents/
  yardlet.yaml
  intent-contract.yaml
  work-queue.yaml
  tool-policy.yaml
  approval-policy.yaml
  interaction-policy.yaml
  research-policy.yaml
  billing-policy.yaml
  workers.yaml
  skills/
    planning-gate/SKILL.md
    delivery-cycle/SKILL.md
    autonomous-work-loop/SKILL.md
  runs/
    <run-id>/
      run.yaml
      task-packet.md
      worker-output.log
      result.json
      validation.log
      evaluation.json
      checkpoint.md
      handoff.md
      evidence/
  checkpoints/
    latest.md
  handoffs/
```

### 6.3 User-level config

Use user-level config for non-secret preferences and worker discovery.

```txt
~/.yard/
  config.yaml
  workers.yaml
  cache/
  templates/
```

Do not store AI provider API keys here.

### 6.4 Packaged core

Final distribution should package Yardlet core code and templates as normal application resources. During local development, a source checkout can provide the templates.

Do not publicly expose `.orcar-core` as the product identity. Existing local prototypes can be migrated, but final user-facing names should be Yardlet-oriented.

### 6.5 State hierarchy

Use this durable state model:

```txt
workspace
  -> intent contract
    -> task
      -> run
        -> cycle
          -> checkpoint
```

Definitions:

- **workspace**: local repo or project directory;
- **intent contract**: current user goal, allowed scope, out-of-scope, acceptance;
- **task**: queue item that can be selected and executed;
- **run**: one bounded worker invocation plus validation/evaluation;
- **cycle**: execution sub-loop inside a run;
- **checkpoint**: compact durable resume point.

---

## 7. Core schemas

### 7.1 `.agents/yardlet.yaml`

```yaml
schema_version: 1
product: yardlet
workspace_id: "auto-generated-stable-id"
created_at: "2026-06-02T00:00:00Z"
state_dir: ".agents"
default_interface: tui
canonical_queue: ".agents/work-queue.yaml"
current_intent: ".agents/intent-contract.yaml"
```

### 7.2 Intent contract

```yaml
schema_version: 1
id: intent-2026-06-02-001
source: user
raw_request: "Improve admin order search with status, email, and date filters."
summary: "Admin users can search orders by status, customer email, and date range."

allowed_scope:
  - "Admin order list UI"
  - "Order search API/query layer"
  - "Local tests and fixtures"

out_of_scope:
  - "Payment logic"
  - "Auth/role redesign"
  - "Production database changes"
  - "Deployment"

acceptance:
  - id: AC-001
    text: "Admin order list exposes status/email/date filters."
    validation: "UI or component test evidence"
  - id: AC-002
    text: "Order query/API layer handles filters correctly."
    validation: "unit/integration tests"
  - id: AC-003
    text: "Existing order list behavior remains intact."
    validation: "listed regression tests pass"

ambiguity:
  score: low
  open_questions: []

interaction:
  question_budget: 2
  user_review_mode: natural_language_scope_and_approval_only
  do_not_ask_for:
    - code_review
    - architecture_review
    - diff_review
    - low_level_file_choice

research_policy:
  allowed: when_needed
  mode: intent_locked

created_by_worker: claude-code
status: accepted
```

### 7.3 Work queue

```yaml
schema_version: 1
queue_id: queue-2026-06-02-001
intent_id: intent-2026-06-02-001
selection_policy:
  default_order: priority_then_created_at
  require_planning_gate: true
  skip_if_approval_required: true
  skip_if_blocked: true

tasks:
  - id: YARD-001
    title: "Inspect current admin order flow"
    state: done
    priority: 10
    risk: low
    kind: research
    preferred_worker: claude-code
    allowed_scope:
      - "Read admin order UI/API/tests"
    validation:
      type: evidence_summary
    interaction:
      may_ask_user: false

  - id: YARD-002
    title: "Implement order search API/query handling"
    state: queued
    priority: 20
    risk: medium
    kind: implementation
    preferred_worker: codex
    allowed_scope:
      - "Order query/API layer"
      - "Related tests"
    validation:
      commands:
        - "pnpm test"
    approval:
      required: false
```

### 7.4 Run record

```yaml
schema_version: 1
run_id: run-2026-06-02-001
task_id: YARD-002
intent_id: intent-2026-06-02-001
worker: codex
state: running
started_at: "2026-06-02T00:00:00Z"
worktree: "."
packet: ".agents/runs/run-2026-06-02-001/task-packet.md"
result: ".agents/runs/run-2026-06-02-001/result.json"
validation: ".agents/runs/run-2026-06-02-001/validation.log"
evaluation: ".agents/runs/run-2026-06-02-001/evaluation.json"
checkpoint: ".agents/runs/run-2026-06-02-001/checkpoint.md"
```

### 7.5 Result schema

```json
{
  "schema_version": 1,
  "run_id": "run-2026-06-02-001",
  "task_id": "YARD-002",
  "status": "done | partial | blocked | failed | needs_user",
  "intent_adherence": {
    "drift_detected": false,
    "notes": "Stayed within allowed scope."
  },
  "changes": {
    "files_modified": [],
    "files_created": [],
    "files_deleted": []
  },
  "validation": {
    "commands_run": [],
    "passed": true,
    "failures": []
  },
  "approval": {
    "required": false,
    "reason": null
  },
  "question_for_user": null,
  "compact_summary": "Short resume summary for the next run."
}
```

### 7.6 Billing policy

```yaml
schema_version: 1
mode: zero_key_subscription_workers

yard_core:
  require_ai_api_key: false
  ask_for_ai_api_key: false
  store_ai_api_key: false
  call_provider_api: false
  auto_api_fallback: false

worker_invocation:
  require_local_cli: true
  require_subscription_backed_auth: true
  if_auth_ambiguous: stop
  if_no_worker_ready: stop
  ai_billing_env_policy: scrub_or_block
  never_print_secret_values: true

blocked_worker_env_names:
  - OPENAI_API_KEY
  - ANTHROPIC_API_KEY
  - OPENAI_BASE_URL
  - ANTHROPIC_BASE_URL
  - OPENAI_ORGANIZATION
  - OPENAI_PROJECT
```

### 7.7 Tool policy

```yaml
schema_version: 1
local_tools:
  shell:
    allowed: true
    approval_required_patterns:
      - "rm -rf"
      - "sudo"
      - "git push"
      - "deploy"
      - "terraform apply"

  local_db:
    allowed: true
    default_mode: read_only
    allowed_environments:
      - dev
      - test
      - fixture
    forbidden_environments:
      - prod

  browser:
    allowed: true
    mode: local_or_sandbox

  computer_use:
    allowed: true
    mode: local_sandbox
    forbidden_actions:
      - purchase
      - account_change
      - external_submit_without_approval

network:
  default: restricted
  allowed_for:
    - official_docs
    - dependency_docs
    - issue_research
  approval_required_for:
    - external_mutation
    - real_api_validation
```

---

## 8. Worker system

### 8.1 Worker profiles

Yardlet should model each worker with a profile:

```yaml
id: codex
kind: cli_worker
role_strengths:
  - focused_implementation
  - test_driven_bugfix
  - shell_heavy_repo_changes
  - local_code_review
billing:
  mode: subscription_backed_only
invocation:
  command: codex
  supports_noninteractive: true
  output_contract: files
limits:
  max_wall_minutes: 45
  max_retries: 1
```

```yaml
id: claude-code
kind: cli_worker
role_strengths:
  - planning_gate
  - ambiguity_reduction
  - acceptance_criteria
  - review
  - handoff_quality
billing:
  mode: subscription_backed_only
invocation:
  command: claude
  supports_noninteractive: true
  output_contract: json_or_files
limits:
  max_wall_minutes: 45
  max_retries: 1
```

Exact CLI flags should be adapter-owned and discovered through installed versions where possible. Yardlet should avoid hard-coding brittle host assumptions in business logic.

### 8.2 Worker readiness checks

Before a worker can run, Yardlet checks:

- binary exists;
- version can be read;
- auth status can be probed or is configured as trusted by local user;
- no known API-billing env leak is passed to the worker process;
- worker is allowed by current policy;
- task risk is compatible with worker policy.

If readiness is ambiguous, Yardlet stops and shows a clear UI state:

```txt
Worker not ready: Claude Code auth mode is ambiguous.
Yardlet did not call an AI API and did not ask for an API key.
Open the worker directly to fix subscription login, then retry.
```

### 8.3 AI-billing environment handling

Yardlet must distinguish:

1. **Yardlet worker execution env** — must not use AI provider API keys.
2. **Project/test runtime env** — may contain app secrets, but use is governed by tool policy and approval policy.

Default worker invocation should sanitize AI billing variables before spawning Codex/Claude workers.

Strict mode can block if such variables are present in the parent process.

Yardlet must never print secret values.

### 8.4 Worker routing

Default routing:

```yaml
planning_gate:
  primary: claude-code
  fallback: codex

implementation:
  primary: codex
  fallback: claude-code

review_or_handoff:
  primary: claude-code
  fallback: codex

failed_validation_repair:
  primary: codex
  fallback: claude-code

ambiguous_scope:
  primary: claude-code
  fallback: none
```

Use only one primary worker per normal task. Add a second worker only when risk or repeated failure justifies the usage cost.

### 8.5 Packet compiler

Yardlet compiles canonical task state into worker-specific packets.

Shared inputs:

- intent contract summary;
- task title and allowed scope;
- out-of-scope items;
- local evidence anchors;
- validation commands;
- output schema;
- interaction policy;
- approval policy;
- compact requirements.

Codex packet style should be focused and execution-oriented.

Claude Code packet style can be more planning/review-oriented, but still bounded.

Packets should prefer anchors over full pasted content:

```txt
Read anchors:
- .agents/intent-contract.yaml
- .agents/work-queue.yaml
- .agents/runs/<run-id>/evidence/repo-summary.md
- src/admin/orders/...

Do not load unrelated docs unless needed for this task.
```

### 8.6 Worker output contract

Every worker run must leave structured artifacts. Natural-language console output is not enough.

Required:

```txt
.agents/runs/<run-id>/result.json
.agents/runs/<run-id>/handoff.md
```

When validation runs:

```txt
.agents/runs/<run-id>/validation.log
```

When evidence is collected:

```txt
.agents/runs/<run-id>/evidence/*
```

---

## 9. Planning gate

### 9.1 Role

The planning gate turns a small natural-language request into a bounded work contract.

It should produce:

- goal summary;
- allowed scope;
- out-of-scope;
- acceptance criteria;
- ambiguity score;
- limited user questions if needed;
- initial queue;
- validation strategy;
- worker routing hints;
- risk classification.

### 9.2 User interaction budget

The planning gate should ask zero or few questions.

Default:

```yaml
question_budget: 2
question_type: natural_language_product_scope_or_approval_only
```

If questions are not essential, Yardlet should proceed with explicit assumptions and record them.

### 9.3 Acceptance criteria tree

Borrow the useful Ouroboros-style idea: acceptance should be structured enough to evaluate, not just a vague sentence.

Example:

```yaml
acceptance:
  - id: AC-001
    statement: "Admin can filter by status."
    evidence:
      - "UI filter exists"
      - "API/query handles status"
      - "test covers status filter"
  - id: AC-002
    statement: "Admin can filter by customer email."
    evidence:
      - "UI filter exists"
      - "API/query handles email"
```

### 9.4 Drift detection

At evaluation time, Yardlet checks whether changes exceeded the contract.

Drift signs:

- files outside allowed scope changed;
- acceptance criteria silently changed;
- adjacent feature added;
- task became broader than requested;
- worker asked user to approve architecture/code instead of resolving locally;
- research was used to redefine product direction.

---

## 10. Evaluator and compact

### 10.1 Initial evaluator

The first evaluator can be deterministic and shallow. It should check:

- result file exists;
- result schema valid;
- task id/run id match;
- worker did not report uncontrolled drift;
- changed files are in allowed areas or justified;
- forbidden paths untouched;
- validation commands were run or a reason was recorded;
- approvals were not bypassed;
- handoff exists;
- checkpoint exists;
- queue update is coherent.

### 10.2 Validation

Validation can include:

- tests;
- lint/typecheck;
- local UI verification;
- local DB read-only queries;
- screenshot evidence;
- manual approval where policy requires it.

Validation evidence should be compact but traceable.

### 10.3 Compact checkpoint

A checkpoint should be short enough to feed into the next cycle.

Required fields:

```md
# Checkpoint

- Intent:
- Task:
- Completed:
- Changed files:
- Validation:
- Blockers:
- Next recommended action:
- Must-read anchors:
```

### 10.4 Handoff

Handoff is for humans and future workers.

It should answer:

- What was attempted?
- What changed?
- What passed/failed?
- What remains?
- What should the next worker read?
- Is user input needed?

---

## 11. Safety and approval

### 11.1 Approval state machine

Task/run approval states:

```txt
not_required
required
requested
approved_once
denied
expired
```

### 11.2 Hard stops

Yardlet must stop if:

- no safe worker is available;
- worker auth/billing mode is ambiguous;
- task needs approval and approval is missing;
- intent drift is detected and cannot be corrected locally;
- production access is requested;
- destructive command is requested;
- validation would call real external services without approval;
- queue state is corrupt;
- run ledger cannot be written.

### 11.3 Safe local freedoms

Yardlet should not be overprotective about safe local evidence. The worker should be allowed to inspect and verify locally without asking constantly.

Examples:

```txt
Allowed without user prompt:
- read local source files
- grep/search repo
- inspect package scripts
- run local unit tests
- query fixture/dev DB read-only
- open local dev UI in sandbox/browser
- inspect local logs
```

---

## 12. Token and usage economy

Yardlet should save both model context and subscription usage.

### 12.1 Context economy

Rules:

- do not paste whole repos;
- do not load every skill;
- do not replay full prior chat;
- use anchors;
- use compact checkpoints;
- keep stable worker packet prefixes stable;
- put variable task content late in packets;
- progressively disclose skills and docs.

### 12.2 Worker usage economy

Rules:

- one primary worker per normal task;
- one retry by default;
- second-worker review only for high risk, repeated failure, or important handoff;
- no background swarm by default;
- no indefinite research;
- no unbounded loop;
- show usage-risk state in the UI.

### 12.3 Local pre-inspection

Yardlet should gather cheap deterministic evidence before invoking a worker:

```txt
repo tree summary
git status
package manager
available scripts
test command candidates
changed files
recent validation logs
schema summaries
```

This reduces the amount of information the worker must discover through token-heavy reasoning.

---

## 13. Pattern absorption

Yardlet should absorb patterns, not dependencies.

### 13.1 From Orcar Brain

Absorb:

- natural-language work intake;
- work-package-like structure;
- durable decisions;
- handoff discipline;
- long-running delivery mindset;
- teammate-readable work records.

Do not absorb:

- Slack dependency;
- Orcar-only context;
- central Brain runtime;
- team-specific assumptions.

### 13.2 From Hermes

Absorb:

- progressive skill loading;
- procedural memory mindset;
- trace-based improvement;
- skill/prompt improvement candidates.

Do not absorb blindly:

- automatic self-patching of skills;
- unreviewed self-evolution;
- external orchestrator dependency.

Yardlet learning lifecycle:

```txt
observation -> candidate -> evaluation -> review -> promotion -> deprecation
```

### 13.3 From Ouroboros

Absorb:

- planning gate;
- ambiguity scoring;
- acceptance criteria tree;
- drift measurement;
- multi-stage evaluation.

### 13.4 From oh-my / OMC

Absorb:

- worker/role routing;
- tool permission matrix;
- hook-style deterministic guards;
- reviewer/security/writer profiles.

Start simple: profiles can be prompt modes over Codex/Claude workers, not a full multi-agent swarm.

### 13.5 From KAIROS/heartbeat systems

Absorb later, carefully:

- bounded loop;
- heartbeat events;
- crash resume;
- watchdog stops.

Do not begin with a 24-hour daemon. Begin with bounded UI-managed runs.

---

## 14. Implementation structure

Recommended source layout:

```txt
yard/
  pyproject.toml or package.json
  src/yard/
    app.py                  # TUI entry
    cli.py                  # support commands
    state.py
    schemas.py
    queue.py
    planner.py
    workers/
      base.py
      codex.py
      claude_code.py
      profiles/
        codex.yaml
        claude-code.yaml
    packets/
      compiler.py
      templates/
        codex-task.md
        claude-task.md
    guard.py
    inspect.py
    ledger.py
    evaluator.py
    compact.py
    handoff.py
    policies.py
    ui/
      home.py
      new_work.py
      planning_gate.py
      queue.py
      run_monitor.py
      approvals.py
      handoff.py
    templates/
      agents/
        yardlet.yaml
        intent-contract.yaml
        work-queue.yaml
        tool-policy.yaml
        approval-policy.yaml
        interaction-policy.yaml
        research-policy.yaml
        billing-policy.yaml
        workers.yaml
        skills/
          planning-gate/SKILL.md
          delivery-cycle/SKILL.md
          autonomous-work-loop/SKILL.md
```

Repo-local installed state:

```txt
.agents/
  yardlet.yaml
  intent-contract.yaml
  work-queue.yaml
  policies...
  runs/
  checkpoints/
  handoffs/
```

### 14.1 Technology choice

Use whichever stack fits the existing repo, but the UI should be terminal-first.

Good options:

- Python + Textual/Rich;
- TypeScript + Ink/Blessed;
- Go + Bubble Tea.

Selection criteria:

- easy subprocess control;
- good TUI ergonomics;
- easy packaging;
- good YAML/schema support;
- easy cross-platform local use;
- low dependency burden.

Do not let framework choice delay core state/worker design. The UI can be simple at first.

---

## 15. Essential workflows

### 15.1 First install/init

```txt
User runs: yardlet
Yardlet sees no .agents state
Yardlet opens setup screen
Yardlet creates .agents templates
Yardlet checks workers
Yardlet shows ready/not-ready state
```

Headless equivalent:

```bash
yardlet init
```

### 15.2 New work

```txt
User opens New Work screen
User types short request
Yardlet saves raw request
Yardlet runs subscription guard
Yardlet routes planning gate to worker
Worker writes intent contract + initial queue
Yardlet shows planning result for high-level accept/edit
User accepts or edits natural language scope
```

### 15.3 Run next task

```txt
User presses Run Next
Yardlet checks queue eligibility
Yardlet checks policies
Yardlet creates run dir
Yardlet collects deterministic local evidence
Yardlet compiles worker packet
Yardlet invokes worker subprocess with sanitized env
Worker writes result/handoff
Yardlet validates/evaluates
Yardlet updates queue
Yardlet writes checkpoint
Yardlet shows summary
```

### 15.4 Approval

```txt
Worker requests gated action
Yardlet stops worker or marks run needs_user
Yardlet shows approval card
User approves/denies in UI
Yardlet resumes or records blocker
```

### 15.5 Handoff

```txt
User opens Handoff screen
Yardlet shows latest compact summary and evidence
User copies/exports handoff
Future worker can resume from checkpoint and queue state
```

---

## 16. Initial implementation slices

These are build slices for the same final product, not separate product versions.

### Slice A — Rename and product surface

Goal: establish Yardlet as the product identity.

Deliverables:

- new final design doc using Yardlet terminology;
- no user-facing `orcar` naming in new files;
- keep `.agents` as canonical repo state;
- define migration note from `.orcar-core` prototypes to Yardlet internals;
- package/app placeholder named `yardlet`.

Acceptance:

- `yardlet` is the documented entrypoint;
- Codex/Claude are described as hidden workers;
- AI API key requirement is explicitly forbidden.

### Slice B — Schemas and repo state

Goal: create the durable file surface.

Deliverables:

- `.agents/yardlet.yaml` template;
- `.agents/intent-contract.yaml` template;
- `.agents/work-queue.yaml` template;
- policy templates;
- run directory schema;
- result schema;
- schema validation command.

Acceptance:

- running init creates valid `.agents` state;
- schema validator catches broken queue/result files.

### Slice C — Basic TUI shell

Goal: make `yardlet` open a usable local workbench.

Deliverables:

- Home screen;
- Workers screen;
- Queue screen;
- Handoff screen placeholder;
- New Work input placeholder;
- status indicators from `.agents` state.

Acceptance:

- user can inspect state without running workers;
- UI does not require API keys;
- UI can run in a local terminal.

### Slice D — Zero-key worker guard

Goal: prevent accidental API billing.

Deliverables:

- worker discovery;
- worker status probes;
- AI-billing env scrub/block;
- clear UI states for unavailable/ambiguous workers;
- no API-key prompt anywhere.

Acceptance:

- missing workers show actionable local-worker setup message;
- detected API-billing env does not get passed silently to workers;
- Yardlet never asks for OpenAI/Anthropic API keys.

### Slice E — Planning gate

Goal: turn a short request into an intent contract and initial queue.

Deliverables:

- New Work screen submits request;
- planning worker packet compiler;
- worker result parser for intent/queue;
- Planning Gate screen for high-level accept/edit;
- question budget enforcement.

Acceptance:

- user can create a new intent from UI;
- planning output is structured and saved;
- questions are high-level and limited.

### Slice F — Worker packet compiler

Goal: compile canonical task state into Codex/Claude-specific packets.

Deliverables:

- Codex task template;
- Claude Code task template;
- packet dry-run view;
- anchors/evidence insertion;
- output contract instructions.

Acceptance:

- same task produces different worker packets;
- packets include intent/scope/validation/output schema;
- packets avoid dumping unrelated context.

### Slice G — Run ledger and worker invocation

Goal: execute one bounded task and record it.

Deliverables:

- run directory creation;
- subprocess invocation;
- worker output log capture;
- result file parsing;
- queue state update;
- run monitor UI.

Acceptance:

- pressing Run Next executes one task;
- artifacts are written under `.agents/runs/<run-id>/`;
- failure/blocked/done states are visible.

### Slice H — Evaluator and compact

Goal: do not trust worker claims blindly.

Deliverables:

- deterministic evaluator;
- validation log capture;
- changed-file scope check;
- checkpoint writer;
- handoff writer;
- Handoff UI populated from artifacts.

Acceptance:

- Yardlet can mark done/failed/blocked based on evidence;
- checkpoint is enough to resume;
- handoff is teammate-readable.

### Slice I — Dogfood and trace-based improvement candidates

Goal: use Yardlet on the local Yardlet repo/infrastructure.

Deliverables:

- seed queue for Yardlet itself;
- run traces;
- prompt/packet improvement observations;
- no automatic self-promotion.

Acceptance:

- at least one real implementation task is completed through Yardlet;
- trace yields improvement candidates;
- no self-patch without review.

---

## 17. Initial queue seed

```yaml
schema_version: 1
queue_id: yardlet-bootstrap-queue
intent_id: yardlet-final-plan
selection_policy:
  default_order: priority_then_created_at
  require_planning_gate: false
  skip_if_blocked: true
  skip_if_approval_required: true

tasks:
  - id: YARD-001
    title: "Create Yardlet product surface and remove user-facing Orcar naming"
    state: queued
    priority: 10
    risk: low
    kind: design_refactor
    preferred_worker: codex
    acceptance:
      - "New files use Yardlet as product name."
      - "Orcar Brain remains reference-only, not dependency."
      - ".agents remains canonical repo state."

  - id: YARD-002
    title: "Add canonical .agents schemas and templates"
    state: queued
    priority: 20
    risk: medium
    kind: implementation
    preferred_worker: codex
    acceptance:
      - ".agents/yardlet.yaml template exists."
      - "intent, queue, policy, worker, run, and result schemas exist."
      - "A schema validation command exists."

  - id: YARD-003
    title: "Build minimal Yardlet terminal UI shell"
    state: queued
    priority: 30
    risk: medium
    kind: implementation
    preferred_worker: codex
    acceptance:
      - "Running yardlet opens a local terminal UI."
      - "Home, Workers, Queue, New Work, Run, Handoff placeholders exist."
      - "UI reads .agents state and does not require workers to render."

  - id: YARD-004
    title: "Implement zero-key worker guard"
    state: queued
    priority: 40
    risk: high
    kind: safety
    preferred_worker: claude-code
    acceptance:
      - "Yardlet never asks for AI provider API keys."
      - "Worker invocation uses sanitized AI-billing environment."
      - "Ambiguous worker auth stops safely."
      - "UI shows worker readiness and billing safety state."

  - id: YARD-005
    title: "Implement planning gate through hidden worker"
    state: queued
    priority: 50
    risk: medium
    kind: implementation
    preferred_worker: claude-code
    acceptance:
      - "New Work request creates an intent contract and initial queue."
      - "Planning questions are limited and high-level."
      - "Planning result can be accepted/edited in the UI."

  - id: YARD-006
    title: "Implement worker packet compiler"
    state: queued
    priority: 60
    risk: medium
    kind: implementation
    preferred_worker: codex
    acceptance:
      - "Codex and Claude Code packets differ by worker profile."
      - "Packets include scope, validation, interaction policy, and output schema."
      - "Dry-run packet preview is available."

  - id: YARD-007
    title: "Implement one-task run ledger and evaluator"
    state: queued
    priority: 70
    risk: medium
    kind: implementation
    preferred_worker: codex
    acceptance:
      - "Run Next creates .agents/runs/<run-id>."
      - "Worker output, result, validation, evaluation, checkpoint, and handoff are recorded."
      - "Queue state updates based on evaluator result."
```

---

## 18. Open decisions

Keep these open, but do not block implementation on them.

1. **Implementation language/framework**  
   Python/Textual, TypeScript/Ink, or Go/Bubble Tea.

2. **Exact worker probe commands**  
   Adapter should discover installed worker behavior and keep probes isolated.

3. **Strict vs scrub env default**  
   Strict blocking is safest. Scrubbing is more convenient. The UI can expose the selected policy, but default should never leak billing keys to worker subprocesses.

4. **Core distribution model**  
   Source checkout vs packaged app vs Homebrew/npm/pipx. The final product should not expose `.orcar-core` as identity.

5. **Web UI later**  
   Terminal UI first. A web UI can be considered only after local state and worker contracts are stable.

6. **Brain bridge later**  
   Optional bridge from Orcar Brain/Slack to Yardlet handoffs can exist later, but Yardlet must remain standalone.

---

## 19. Final implementation guidance for agents

When an implementation agent reads this document, it should preserve these decisions:

```txt
Product name: Yardlet
Primary UX: terminal UI / local workbench
User-facing CLI replacement: yes
Internal worker replacement: not yet
Workers: Codex CLI and Claude Code CLI, hidden behind Yardlet
AI API keys: forbidden as a Yardlet requirement
Canonical state: .agents
Brain dependency: no
Orcar Brain patterns: yes, as reference
Intent lock: safety rule, not product identity
Research: allowed, intent-locked
Local tools: free inside sandbox/policy
User questions: few and high-level
Token economy: required
Compact/handoff: required
```

The first useful experience should be:

```txt
1. User runs `yardlet`.
2. Yardlet opens a local terminal workbench.
3. User enters a short work request.
4. Yardlet safely invokes a subscription-backed planning worker.
5. Yardlet shows a product-level planning result.
6. User accepts or edits in natural language.
7. Yardlet queues work.
8. User presses Run Next.
9. Yardlet invokes a hidden worker with a bounded packet.
10. Yardlet validates, records, compacts, and shows handoff.
```

That is the product.