# Yardlet — Local AI Workbench Final Build Plan
> Status: final working plan for implementation
> Target repo path: `docs/designs/local-autonomous-agent-operating-layer.md`
> Product name: **Yardlet**
> Primary interface: **local terminal UI / operating console**
> Canonical repo state: `.agents/`
> Worker engines: subscription-backed Codex CLI / Claude Code CLI, hidden behind Yardlet
> Hard constraint: Yardlet itself must not require, request, store, or call AI provider API keys
---
## 0. One-sentence definition
**Yardlet is a local AI workbench that lets a user describe work in a few natural-language sentences, then manages planning, queued execution, worker routing, validation, compacting, handoff, and safety inside the local workspace while using Codex CLI and Claude Code CLI as hidden subscription-backed workers.**
Shorter:
> Yardlet is the local operating console where AI coding workers plan, build, verify, and hand off long-running work inside your workspace.
The user should normally open **Yardlet**, not Codex or Claude Code directly.
```txt
User
-> Yardlet UI
-> planning gate
-> intent / scope / acceptance contract
-> queue / state / ledger
-> worker packet compiler
-> Codex CLI or Claude Code CLI as hidden worker
-> validation / evaluation
-> checkpoint / handoff
```
---
## 1. Final product stance
### 1.1 Yardlet is UI-first
Yardlet is not primarily a command collection. It should feel like a local workbench opened from the terminal.
The normal entrypoint is:
```bash
yard
```
That opens a terminal UI where the user can:
- create or refine a work request;
- inspect the current intent, scope, and acceptance criteria;
- see the queue;
- start the next bounded worker run;
- pause/stop work;
- approve or deny gated actions;
- inspect run evidence;
- read or export a handoff;
- see worker readiness and billing-safety status.
CLI commands still exist, but they are mostly for automation, scripting, debugging, and the UI implementation itself.
```bash
yardlet init
yardlet status --json
yardlet worker status
yardlet inspect repo --json
yardlet packet --task YARD-001 --worker codex --dry-run
yardlet run --next --headless
```
A teammate should not need to learn Codex CLI flags, Claude Code CLI flags, or host-specific prompt tricks. Yardlet owns that.
### 1.2 Yardlet replaces the user-facing Codex/Claude Code workflow, not the worker engines yet
Yardlet should replace the day-to-day user experience of manually running:
```bash
codex
codex exec ...
claude
claude -p ...
```
But Yardlet does **not** immediately replace Codex CLI and Claude Code as internal engines. Those tools remain valuable hidden workers, especially because API-based usage can be more expensive than subscription-backed usage.
The relationship is:
```txt
Yardlet = controller, workbench, scheduler, guard, state owner
Codex CLI = hidden implementation/review worker
Claude Code CLI = hidden planning/review/implementation worker
```
Yardlet should treat workers as interchangeable engines behind a contract:
```txt
Task contract in
-> host-specific packet
-> worker subprocess
-> structured result files out
-> Yardlet evaluator decides next state
```
### 1.3 Yardlet must not require AI API keys
This is a product rule, not just an implementation preference.
Yardlet core must not:
- require an OpenAI, Anthropic, or other AI provider API key;
- ask the user to paste an AI API key;
- store AI provider API keys;
- silently fall back to an API/SDK call;
- call OpenAI/Anthropic AI APIs directly;
- treat missing API keys as a setup failure.
Yardlet may call already-installed local worker CLIs after guard checks:
```txt
Codex CLI, already logged in through a subscription-backed account
Claude Code CLI, already logged in through a Pro/Max-style subscription-backed account
```
If no safe local worker is available, Yardlet must stop with a clear local-worker readiness message. It must not ask for an API key.
### 1.4 Yardlet is standalone, not Orcar Brain local mode
Yardlet can absorb operating patterns from Orcar Brain, but it must not depend on Brain, Slack, Orcar membership, Orcar-specific Work Packages, or internal team context.
```txt
Orcar Brain
= team agent, Slack-centered, organization/context heavy
Yardlet
= local workbench, repo/workspace-centered, standalone, usable by non-Orcar users
```
Brain may later trigger Yardlet or consume Yardlet handoffs, but that is an optional bridge.
### 1.5 Build the final shape now
Do not build a throwaway wrapper and later hope it becomes a workbench.
The first implementation should include the final architecture’s essential surfaces, even if each component is shallow:
- terminal UI shell;
- repo initialization;
- planning gate;
- intent/scope/acceptance contract;
- structured queue;
- worker profiles;
- worker packet compiler;
- zero-key worker guard;
- run ledger;
- result schema;
- evaluator;
- compact checkpoint;
- handoff;
- tool, approval, interaction, research, and billing policies.
The point is not to make every algorithm perfect immediately. The point is to avoid the wrong shape.
---
## 2. What Yardlet is for
Yardlet should support three main situations.
### 2.1 Existing repo work
A user opens Yardlet in an existing repo and enters a short request:
```txt
Add admin order search with status, email, and date filters.
```
Yardlet should:
1. inspect the repo enough to understand the local environment;
2. run a planning gate through a safe local worker;
3. produce an intent contract and initial queue;
4. route bounded tasks to Codex/Claude workers;
5. validate locally;
6. record what happened;
7. compact context;
8. show the next step or handoff in the UI.
### 2.2 New idea to local project
A user opens Yardlet in an empty or new workspace and enters an idea:
```txt
I want a small local app for searching and summarizing team knowledge docs.
```
Yardlet should not immediately create a giant app from vibes. It should:
1. compress the idea into product intent, boundaries, and success criteria;
2. ask at most a small number of high-level natural-language questions if the idea is under-specified;
3. create a starter project plan;
4. seed a queue;
5. run bounded creation tasks;
6. keep the user in the workbench loop.
### 2.3 Team-consistent AI work
A team can adopt Yardlet so that AI-assisted work produces consistent records:
```txt
.agents/intent-contract.yaml
.agents/work-queue.yaml
.agents/runs/<run-id>/result.json
.agents/runs/<run-id>/validation.log
.agents/runs/<run-id>/checkpoint.md
.agents/runs/<run-id>/handoff.md
```
This turns “I asked an agent and it changed some files” into an auditable work unit that a teammate can inspect and resume.
---
## 3. Goals and non-goals
### 3.1 Goals
1. **Minimal input, long local work**
A user should be able to give a short natural-language request and let Yardlet carry the work through planning, execution, validation, compacting, and handoff.
2. **UI-first local workbench**
Yardlet should provide a practical terminal UI that becomes the normal place to manage local agent work.
3. **Subscription-backed worker usage**
Yardlet should use local Codex/Claude Code workers through their existing subscription-backed sessions and prevent accidental AI API billing.
4. **Worker-specific optimization**
Codex and Claude Code should receive different task packets. Yardlet compiles the same canonical task contract into worker-specific instructions.
5. **Intent and scope stability**
Intent lock is not the product identity, but it is a core safety rule. Worker freedom must stay inside the user’s intended work.
6. **Low user interruption**
Yardlet should avoid asking users for code review, architecture review, diff review, or low-level implementation choices. When it asks, questions should be few and natural-language product/scope/approval questions.
7. **Local evidence freedom**
Yardlet should freely gather local evidence inside policy: repo inspection, tests, read-only dev/test/fixture DB queries, browser/devtools, emulator/simulator, and sandboxed computer-use.
8. **Intent-locked research**
Research is allowed when needed, but research is evidence gathering, not permission to change the product goal.
9. **Token/context economy**
Long-running work must not mean stuffing everything into context. Yardlet should rely on anchors, progressive disclosure, compact checkpoints, result schemas, and durable state.
10. **Auditable handoff**
Every bounded run should leave enough evidence for a teammate or future agent to understand what happened and resume safely.
11. **Pattern absorption**
Yardlet should absorb useful patterns from Hermes, Ouroboros, oh-my/OMC, KAIROS, and Orcar Brain without becoming dependent on those systems.
### 3.2 Non-goals
Yardlet should not be:
- a hosted service;
- a Slack bot;
- Orcar Brain running locally;
- a mandatory AI API wrapper;
- a tool that asks the user to paste AI provider API keys;
- a skill marketplace;
- a 24-hour unbounded daemon;
- a self-modifying skill system without review;
- a generic chat app;
- a thin alias over `codex` and `claude`.
---
## 4. Core design principles
### 4.1 The workbench owns state
Codex and Claude Code are workers. They do not own canonical task state.
Canonical state lives under `.agents/` in the repo/workspace:
```txt
.agents/
yardlet.yaml
intent-contract.yaml
work-queue.yaml
tool-policy.yaml
approval-policy.yaml
interaction-policy.yaml
research-policy.yaml
billing-policy.yaml
workers.yaml
runs/
checkpoints/
handoffs/
```
Yardlet state must be durable and readable without previous chat context.
### 4.2 UI is the normal interface; CLI is a supporting surface
The primary user workflow happens inside the Yardlet terminal UI.
Commands should exist, but the user should rarely need to manually chain them.
```txt
Normal:
yard -> UI -> New Work / Queue / Run / Handoff screens
Advanced:
yardlet run --next --headless
yardlet status --json
yardlet packet --dry-run
```
### 4.3 Yardlet is deterministic until it invokes a worker
Before worker invocation, Yardlet should behave like a local controller, not like a hidden AI.
It may:
- read `.agents` files;
- inspect repo metadata;
- inspect git status;
- detect package managers and test commands;
- check worker readiness;
- enforce policies;
- select eligible tasks;
- compile packets;
- show dry-run packets;
- create run directories.
It must not:
- call AI provider APIs;
- do hidden AI reasoning through an SDK;
- ask for AI API keys;
- silently fall back to a provider endpoint.
Any AI reasoning step, including planning-gate work, must be executed through an explicitly selected local worker after the same guard checks.
### 4.4 Tool freedom is separate from intent freedom
Workers may use local tools freely inside the policy, but they may not redefine the job.
Allowed by default inside a suitable sandbox:
- repo inspection;
- local search;
- local tests and linters;
- read-only dev/test/fixture DB queries;
- local browser/devtools;
- emulator/simulator;
- screenshots of local UI;
- intent-locked research.
Gated or blocked:
- production DB access;
- writes to shared or remote systems;
- deploy/publish/send;
- purchases/account changes;
- credential extraction;
- destructive file operations outside workspace;
- scope expansion.
### 4.5 Ask fewer, higher-level questions
Yardlet should not ask users to review code, architecture, or diffs as the normal path.
Allowed user questions:
- product intent;
- scope boundary;
- acceptance priority;
- high-risk approval;
- blocked external dependency;
- whether to stop, continue, or split work after a genuine blocker.
Disallowed habitual questions:
- “Should I edit this file?”
- “Is this architecture okay?”
- “Can you review this diff?”
- “Which exact internal helper should I use?”
- “Should I run the next obvious test?”
### 4.6 Compact is a first-class operation
Yardlet must not rely on chat history as memory.
At task/cycle boundaries it should compact into durable artifacts:
```txt
checkpoint.md
result.json
validation.log
handoff.md
work-queue.yaml state update
```
Next runs should start from compact capsules and file anchors, not the previous long conversation.
### 4.7 Research is allowed but must be intent-locked
Research is an evidence tool, not a product manager.
Each research event should record:
```yaml
research_question: "..."
source_or_anchor: "..."
used_for: "..."
decision_impact: "..."
scope_impact: none | planning_update_required | approval_required
drift_detected: false
```
If research suggests a valuable adjacent idea, Yardlet should record it as a candidate queue item or note, not silently include it in the current task.
### 4.8 Zero-key AI policy
Yardlet core requires zero AI API keys.
Worker subprocesses should run in a sanitized AI-billing environment. Yardlet should scrub or block provider-billing environment variables for worker invocation, depending on configured strictness.
Important nuance: a repository may contain API keys for its own tests or application runtime. Yardlet must not read, display, or store secret values. The worker guard is about preventing **Yardlet’s AI worker execution** from using provider API billing by accident. Repo validation that calls external APIs should be separately governed by tool policy and approval policy.
---
## 5. UX: Yardlet terminal UI
### 5.1 Default launch
```bash
yard
```
Expected first screen:
```txt
┌ Yardlet ─ Local AI Workbench ─────────────────────────────────────┐
│ Repo: acme-storefront Workers: 2 ready │
│ Intent: Admin order search │
│ Scope: admin orders UI, order API, local tests │
│ Status: 1 running, 3 queued, 0 blocked │
├ Queue ─────────────────────────────────────────────────────────┤
│ ✓ YARD-001 Inspect current order flow claude │
│ ▶ YARD-002 Implement search API codex │
│ · YARD-003 Add UI filters codex │
│ · YARD-004 Validate and hand off claude │
├ Run ───────────────────────────────────────────────────────────┤
│ Worker: codex │
│ Validation: pending │
│ Drift: none │
│ Approval: none required │
├ Actions ────────────────────────────────────────────────────────┤
│ n new work r run next p pause a approvals h handoff │
│ q quit ? help d details w workers s settings │
└─────────────────────────────────────────────────────────────────┘
```
### 5.2 Main screens
#### Home
Shows the current workspace state:
- active intent;
- queue summary;
- current run;
- worker readiness;
- pending approvals;
- latest validation status;
- latest handoff.
#### New Work
A small natural-language input area.
The UI should encourage concise input:
```txt
What do you want Yardlet to work on?
[ Improve admin order search with status/email/date filters. ]
```
Optional fields:
- must include;
- must not touch;
- preferred validation;
- risk tolerance.
#### Planning Gate
Shows a human-readable product/scope contract, not implementation internals.
The user can accept, edit in natural language, or ask Yardlet to refine.
Example:
```txt
Goal
Admin users can search orders by status, customer email, and date range.
Allowed scope
- Admin order list UI
- Order search API/query layer
- Local tests and fixtures
Out of scope
- Payment logic
- Auth/role redesign
- Production DB changes
Acceptance
- UI exposes the filters
- API handles filter params correctly
- Existing order list behavior still works
- Listed validation commands pass
```
#### Queue
Shows work items and lets the user reorder, pause, block, or split at a high level.
The UI should not expose too much internal YAML by default.
#### Run Monitor
Shows one bounded worker run:
- selected worker;
- task packet summary;
- live event stream;
- changed files;
- validation commands;
- drift/approval warnings;
- result status.
Raw worker logs should be available, but collapsed by default.
#### Approvals
Shows only gated actions:
- production-like access;
- destructive command;
- network mutation;
- deploy/publish/send;
- scope expansion;
- real external API call for validation;
- secret access.
The user can approve once, deny, or convert the action into a new queue item.
#### Handoff
Shows the compact summary:
- completed work;
- acceptance status;
- validation evidence;
- changed files;
- remaining blockers;
- next recommended slice;
- must-read anchors.
### 5.3 UI implementation choice
Initial implementation should be a terminal UI, not a web dashboard.
Good properties:
- runs in the same repo terminal;
- works over SSH;
- simple to ship open source;
- easy to pair with worker subprocesses;
- naturally local-first.
The UI must not become the canonical state store. It reads and writes through Yardlet’s state layer.
---
## 6. Architecture
### 6.1 High-level layers
```txt
Yardlet UI
-> state/service layer
-> planning gate
-> queue manager
-> policy engine
-> worker router
-> packet compiler
-> worker runner
-> evaluator
-> compact/handoff writer
-> .agents files
-> Codex/Claude worker subprocesses
```
### 6.2 Repo-local state
```txt
.agents/
yardlet.yaml
intent-contract.yaml
work-queue.yaml
tool-policy.yaml
approval-policy.yaml
interaction-policy.yaml
research-policy.yaml
billing-policy.yaml
workers.yaml
skills/
planning-gate/SKILL.md
delivery-cycle/SKILL.md
autonomous-work-loop/SKILL.md
runs/
<run-id>/
run.yaml
task-packet.md
worker-output.log
result.json
validation.log
evaluation.json
checkpoint.md
handoff.md
evidence/
checkpoints/
latest.md
handoffs/
```
### 6.3 User-level config
Use user-level config for non-secret preferences and worker discovery.
```txt
~/.yard/
config.yaml
workers.yaml
cache/
templates/
```
Do not store AI provider API keys here.
### 6.4 Packaged core
Final distribution should package Yardlet core code and templates as normal application resources. During local development, a source checkout can provide the templates.
Do not publicly expose `.orcar-core` as the product identity. Existing local prototypes can be migrated, but final user-facing names should be Yardlet-oriented.
### 6.5 State hierarchy
Use this durable state model:
```txt
workspace
-> intent contract
-> task
-> run
-> cycle
-> checkpoint
```
Definitions:
- **workspace**: local repo or project directory;
- **intent contract**: current user goal, allowed scope, out-of-scope, acceptance;
- **task**: queue item that can be selected and executed;
- **run**: one bounded worker invocation plus validation/evaluation;
- **cycle**: execution sub-loop inside a run;
- **checkpoint**: compact durable resume point.
---
## 7. Core schemas
### 7.1 `.agents/yardlet.yaml`
```yaml
schema_version: 1
product: yardlet
workspace_id: "auto-generated-stable-id"
created_at: "2026-06-02T00:00:00Z"
state_dir: ".agents"
default_interface: tui
canonical_queue: ".agents/work-queue.yaml"
current_intent: ".agents/intent-contract.yaml"
```
### 7.2 Intent contract
```yaml
schema_version: 1
id: intent-2026-06-02-001
source: user
raw_request: "Improve admin order search with status, email, and date filters."
summary: "Admin users can search orders by status, customer email, and date range."
allowed_scope:
- "Admin order list UI"
- "Order search API/query layer"
- "Local tests and fixtures"
out_of_scope:
- "Payment logic"
- "Auth/role redesign"
- "Production database changes"
- "Deployment"
acceptance:
- id: AC-001
text: "Admin order list exposes status/email/date filters."
validation: "UI or component test evidence"
- id: AC-002
text: "Order query/API layer handles filters correctly."
validation: "unit/integration tests"
- id: AC-003
text: "Existing order list behavior remains intact."
validation: "listed regression tests pass"
ambiguity:
score: low
open_questions: []
interaction:
question_budget: 2
user_review_mode: natural_language_scope_and_approval_only
do_not_ask_for:
- code_review
- architecture_review
- diff_review
- low_level_file_choice
research_policy:
allowed: when_needed
mode: intent_locked
created_by_worker: claude-code
status: accepted
```
### 7.3 Work queue
```yaml
schema_version: 1
queue_id: queue-2026-06-02-001
intent_id: intent-2026-06-02-001
selection_policy:
default_order: priority_then_created_at
require_planning_gate: true
skip_if_approval_required: true
skip_if_blocked: true
tasks:
- id: YARD-001
title: "Inspect current admin order flow"
state: done
priority: 10
risk: low
kind: research
preferred_worker: claude-code
allowed_scope:
- "Read admin order UI/API/tests"
validation:
type: evidence_summary
interaction:
may_ask_user: false
- id: YARD-002
title: "Implement order search API/query handling"
state: queued
priority: 20
risk: medium
kind: implementation
preferred_worker: codex
allowed_scope:
- "Order query/API layer"
- "Related tests"
validation:
commands:
- "pnpm test"
approval:
required: false
```
### 7.4 Run record
```yaml
schema_version: 1
run_id: run-2026-06-02-001
task_id: YARD-002
intent_id: intent-2026-06-02-001
worker: codex
state: running
started_at: "2026-06-02T00:00:00Z"
worktree: "."
packet: ".agents/runs/run-2026-06-02-001/task-packet.md"
result: ".agents/runs/run-2026-06-02-001/result.json"
validation: ".agents/runs/run-2026-06-02-001/validation.log"
evaluation: ".agents/runs/run-2026-06-02-001/evaluation.json"
checkpoint: ".agents/runs/run-2026-06-02-001/checkpoint.md"
```
### 7.5 Result schema
```json
{
"schema_version": 1,
"run_id": "run-2026-06-02-001",
"task_id": "YARD-002",
"drift_detected": false,
"notes": "Stayed within allowed scope."
},
"changes": {
"files_modified": [],
"files_created": [],
"files_deleted": []
},
"validation": {
"commands_run": [],
"passed": true,
"failures": []
},
"approval": {
"required": false,
"reason": null
},
"question_for_user": null,
"compact_summary": "Short resume summary for the next run."
}
```
### 7.6 Billing policy
```yaml
schema_version: 1
mode: zero_key_subscription_workers
yard_core:
require_ai_api_key: false
ask_for_ai_api_key: false
store_ai_api_key: false
call_provider_api: false
auto_api_fallback: false
worker_invocation:
require_local_cli: true
require_subscription_backed_auth: true
if_auth_ambiguous: stop
if_no_worker_ready: stop
ai_billing_env_policy: scrub_or_block
never_print_secret_values: true
blocked_worker_env_names:
- OPENAI_API_KEY
- ANTHROPIC_API_KEY
- OPENAI_BASE_URL
- ANTHROPIC_BASE_URL
- OPENAI_ORGANIZATION
- OPENAI_PROJECT
```
### 7.7 Tool policy
```yaml
schema_version: 1
local_tools:
shell:
allowed: true
approval_required_patterns:
- "rm -rf"
- "sudo"
- "git push"
- "deploy"
- "terraform apply"
local_db:
allowed: true
default_mode: read_only
allowed_environments:
- dev
- test
- fixture
forbidden_environments:
- prod
browser:
allowed: true
mode: local_or_sandbox
computer_use:
allowed: true
mode: local_sandbox
forbidden_actions:
- purchase
- account_change
- external_submit_without_approval
network:
default: restricted
allowed_for:
- official_docs
- dependency_docs
- issue_research
approval_required_for:
- external_mutation
- real_api_validation
```
---
## 8. Worker system
### 8.1 Worker profiles
Yardlet should model each worker with a profile:
```yaml
id: codex
kind: cli_worker
role_strengths:
- focused_implementation
- test_driven_bugfix
- shell_heavy_repo_changes
- local_code_review
billing:
mode: subscription_backed_only
invocation:
command: codex
supports_noninteractive: true
output_contract: files
limits:
max_wall_minutes: 45
max_retries: 1
```
```yaml
id: claude-code
kind: cli_worker
role_strengths:
- planning_gate
- ambiguity_reduction
- acceptance_criteria
- review
- handoff_quality
billing:
mode: subscription_backed_only
invocation:
command: claude
supports_noninteractive: true
output_contract: json_or_files
limits:
max_wall_minutes: 45
max_retries: 1
```
Exact CLI flags should be adapter-owned and discovered through installed versions where possible. Yardlet should avoid hard-coding brittle host assumptions in business logic.
### 8.2 Worker readiness checks
Before a worker can run, Yardlet checks:
- binary exists;
- version can be read;
- auth status can be probed or is configured as trusted by local user;
- no known API-billing env leak is passed to the worker process;
- worker is allowed by current policy;
- task risk is compatible with worker policy.
If readiness is ambiguous, Yardlet stops and shows a clear UI state:
```txt
Worker not ready: Claude Code auth mode is ambiguous.
Yardlet did not call an AI API and did not ask for an API key.
Open the worker directly to fix subscription login, then retry.
```
### 8.3 AI-billing environment handling
Yardlet must distinguish:
1. **Yardlet worker execution env** — must not use AI provider API keys.
2. **Project/test runtime env** — may contain app secrets, but use is governed by tool policy and approval policy.
Default worker invocation should sanitize AI billing variables before spawning Codex/Claude workers.
Strict mode can block if such variables are present in the parent process.
Yardlet must never print secret values.
### 8.4 Worker routing
Default routing:
```yaml
planning_gate:
primary: claude-code
fallback: codex
implementation:
primary: codex
fallback: claude-code
review_or_handoff:
primary: claude-code
fallback: codex
failed_validation_repair:
primary: codex
fallback: claude-code
ambiguous_scope:
primary: claude-code
fallback: none
```
Use only one primary worker per normal task. Add a second worker only when risk or repeated failure justifies the usage cost.
### 8.5 Packet compiler
Yardlet compiles canonical task state into worker-specific packets.
Shared inputs:
- intent contract summary;
- task title and allowed scope;
- out-of-scope items;
- local evidence anchors;
- validation commands;
- output schema;
- interaction policy;
- approval policy;
- compact requirements.
Codex packet style should be focused and execution-oriented.
Claude Code packet style can be more planning/review-oriented, but still bounded.
Packets should prefer anchors over full pasted content:
```txt
Read anchors:
- .agents/intent-contract.yaml
- .agents/work-queue.yaml
- .agents/runs/<run-id>/evidence/repo-summary.md
- src/admin/orders/...
Do not load unrelated docs unless needed for this task.
```
### 8.6 Worker output contract
Every worker run must leave structured artifacts. Natural-language console output is not enough.
Required:
```txt
.agents/runs/<run-id>/result.json
.agents/runs/<run-id>/handoff.md
```
When validation runs:
```txt
.agents/runs/<run-id>/validation.log
```
When evidence is collected:
```txt
.agents/runs/<run-id>/evidence/*
```
---
## 9. Planning gate
### 9.1 Role
The planning gate turns a small natural-language request into a bounded work contract.
It should produce:
- goal summary;
- allowed scope;
- out-of-scope;
- acceptance criteria;
- ambiguity score;
- limited user questions if needed;
- initial queue;
- validation strategy;
- worker routing hints;
- risk classification.
### 9.2 User interaction budget
The planning gate should ask zero or few questions.
Default:
```yaml
question_budget: 2
question_type: natural_language_product_scope_or_approval_only
```
If questions are not essential, Yardlet should proceed with explicit assumptions and record them.
### 9.3 Acceptance criteria tree
Borrow the useful Ouroboros-style idea: acceptance should be structured enough to evaluate, not just a vague sentence.
Example:
```yaml
acceptance:
- id: AC-001
statement: "Admin can filter by status."
evidence:
- "UI filter exists"
- "API/query handles status"
- "test covers status filter"
- id: AC-002
statement: "Admin can filter by customer email."
evidence:
- "UI filter exists"
- "API/query handles email"
```
### 9.4 Drift detection
At evaluation time, Yardlet checks whether changes exceeded the contract.
Drift signs:
- files outside allowed scope changed;
- acceptance criteria silently changed;
- adjacent feature added;
- task became broader than requested;
- worker asked user to approve architecture/code instead of resolving locally;
- research was used to redefine product direction.
---
## 10. Evaluator and compact
### 10.1 Initial evaluator
The first evaluator can be deterministic and shallow. It should check:
- result file exists;
- result schema valid;
- task id/run id match;
- worker did not report uncontrolled drift;
- changed files are in allowed areas or justified;
- forbidden paths untouched;
- validation commands were run or a reason was recorded;
- approvals were not bypassed;
- handoff exists;
- checkpoint exists;
- queue update is coherent.
### 10.2 Validation
Validation can include:
- tests;
- lint/typecheck;
- local UI verification;
- local DB read-only queries;
- screenshot evidence;
- manual approval where policy requires it.
Validation evidence should be compact but traceable.
### 10.3 Compact checkpoint
A checkpoint should be short enough to feed into the next cycle.
Required fields:
```md
# Checkpoint
- Intent:
- Task:
- Completed:
- Changed files:
- Validation:
- Blockers:
- Next recommended action:
- Must-read anchors:
```
### 10.4 Handoff
Handoff is for humans and future workers.
It should answer:
- What was attempted?
- What changed?
- What passed/failed?
- What remains?
- What should the next worker read?
- Is user input needed?
---
## 11. Safety and approval
### 11.1 Approval state machine
Task/run approval states:
```txt
not_required
required
requested
approved_once
denied
expired
```
### 11.2 Hard stops
Yardlet must stop if:
- no safe worker is available;
- worker auth/billing mode is ambiguous;
- task needs approval and approval is missing;
- intent drift is detected and cannot be corrected locally;
- production access is requested;
- destructive command is requested;
- validation would call real external services without approval;
- queue state is corrupt;
- run ledger cannot be written.
### 11.3 Safe local freedoms
Yardlet should not be overprotective about safe local evidence. The worker should be allowed to inspect and verify locally without asking constantly.
Examples:
```txt
Allowed without user prompt:
- read local source files
- grep/search repo
- inspect package scripts
- run local unit tests
- query fixture/dev DB read-only
- open local dev UI in sandbox/browser
- inspect local logs
```
---
## 12. Token and usage economy
Yardlet should save both model context and subscription usage.
### 12.1 Context economy
Rules:
- do not paste whole repos;
- do not load every skill;
- do not replay full prior chat;
- use anchors;
- use compact checkpoints;
- keep stable worker packet prefixes stable;
- put variable task content late in packets;
- progressively disclose skills and docs.
### 12.2 Worker usage economy
Rules:
- one primary worker per normal task;
- one retry by default;
- second-worker review only for high risk, repeated failure, or important handoff;
- no background swarm by default;
- no indefinite research;
- no unbounded loop;
- show usage-risk state in the UI.
### 12.3 Local pre-inspection
Yardlet should gather cheap deterministic evidence before invoking a worker:
```txt
repo tree summary
git status
package manager
available scripts
test command candidates
changed files
recent validation logs
schema summaries
```
This reduces the amount of information the worker must discover through token-heavy reasoning.
---
## 13. Pattern absorption
Yardlet should absorb patterns, not dependencies.
### 13.1 From Orcar Brain
Absorb:
- natural-language work intake;
- work-package-like structure;
- durable decisions;
- handoff discipline;
- long-running delivery mindset;
- teammate-readable work records.
Do not absorb:
- Slack dependency;
- Orcar-only context;
- central Brain runtime;
- team-specific assumptions.
### 13.2 From Hermes
Absorb:
- progressive skill loading;
- procedural memory mindset;
- trace-based improvement;
- skill/prompt improvement candidates.
Do not absorb blindly:
- automatic self-patching of skills;
- unreviewed self-evolution;
- external orchestrator dependency.
Yardlet learning lifecycle:
```txt
observation -> candidate -> evaluation -> review -> promotion -> deprecation
```
### 13.3 From Ouroboros
Absorb:
- planning gate;
- ambiguity scoring;
- acceptance criteria tree;
- drift measurement;
- multi-stage evaluation.
### 13.4 From oh-my / OMC
Absorb:
- worker/role routing;
- tool permission matrix;
- hook-style deterministic guards;
- reviewer/security/writer profiles.
Start simple: profiles can be prompt modes over Codex/Claude workers, not a full multi-agent swarm.
### 13.5 From KAIROS/heartbeat systems
Absorb later, carefully:
- bounded loop;
- heartbeat events;
- crash resume;
- watchdog stops.
Do not begin with a 24-hour daemon. Begin with bounded UI-managed runs.
---
## 14. Implementation structure
Recommended source layout:
```txt
yard/
pyproject.toml or package.json
src/yard/
app.py # TUI entry
cli.py # support commands
state.py
schemas.py
queue.py
planner.py
workers/
base.py
codex.py
claude_code.py
profiles/
codex.yaml
claude-code.yaml
packets/
compiler.py
templates/
codex-task.md
claude-task.md
guard.py
inspect.py
ledger.py
evaluator.py
compact.py
handoff.py
policies.py
ui/
home.py
new_work.py
planning_gate.py
queue.py
run_monitor.py
approvals.py
handoff.py
templates/
agents/
yardlet.yaml
intent-contract.yaml
work-queue.yaml
tool-policy.yaml
approval-policy.yaml
interaction-policy.yaml
research-policy.yaml
billing-policy.yaml
workers.yaml
skills/
planning-gate/SKILL.md
delivery-cycle/SKILL.md
autonomous-work-loop/SKILL.md
```
Repo-local installed state:
```txt
.agents/
yardlet.yaml
intent-contract.yaml
work-queue.yaml
policies...
runs/
checkpoints/
handoffs/
```
### 14.1 Technology choice
Use whichever stack fits the existing repo, but the UI should be terminal-first.
Good options:
- Python + Textual/Rich;
- TypeScript + Ink/Blessed;
- Go + Bubble Tea.
Selection criteria:
- easy subprocess control;
- good TUI ergonomics;
- easy packaging;
- good YAML/schema support;
- easy cross-platform local use;
- low dependency burden.
Do not let framework choice delay core state/worker design. The UI can be simple at first.
---
## 15. Essential workflows
### 15.1 First install/init
```txt
User runs: yardlet
Yardlet sees no .agents state
Yardlet opens setup screen
Yardlet creates .agents templates
Yardlet checks workers
Yardlet shows ready/not-ready state
```
Headless equivalent:
```bash
yardlet init
```
### 15.2 New work
```txt
User opens New Work screen
User types short request
Yardlet saves raw request
Yardlet runs subscription guard
Yardlet routes planning gate to worker
Worker writes intent contract + initial queue
Yardlet shows planning result for high-level accept/edit
User accepts or edits natural language scope
```
### 15.3 Run next task
```txt
User presses Run Next
Yardlet checks queue eligibility
Yardlet checks policies
Yardlet creates run dir
Yardlet collects deterministic local evidence
Yardlet compiles worker packet
Yardlet invokes worker subprocess with sanitized env
Worker writes result/handoff
Yardlet validates/evaluates
Yardlet updates queue
Yardlet writes checkpoint
Yardlet shows summary
```
### 15.4 Approval
```txt
Worker requests gated action
Yardlet stops worker or marks run needs_user
Yardlet shows approval card
User approves/denies in UI
Yardlet resumes or records blocker
```
### 15.5 Handoff
```txt
User opens Handoff screen
Yardlet shows latest compact summary and evidence
User copies/exports handoff
Future worker can resume from checkpoint and queue state
```
---
## 16. Initial implementation slices
These are build slices for the same final product, not separate product versions.
### Slice A — Rename and product surface
Goal: establish Yardlet as the product identity.
Deliverables:
- new final design doc using Yardlet terminology;
- no user-facing `orcar` naming in new files;
- keep `.agents` as canonical repo state;
- define migration note from `.orcar-core` prototypes to Yardlet internals;
- package/app placeholder named `yardlet`.
Acceptance:
- `yardlet` is the documented entrypoint;
- Codex/Claude are described as hidden workers;
- AI API key requirement is explicitly forbidden.
### Slice B — Schemas and repo state
Goal: create the durable file surface.
Deliverables:
- `.agents/yardlet.yaml` template;
- `.agents/intent-contract.yaml` template;
- `.agents/work-queue.yaml` template;
- policy templates;
- run directory schema;
- result schema;
- schema validation command.
Acceptance:
- running init creates valid `.agents` state;
- schema validator catches broken queue/result files.
### Slice C — Basic TUI shell
Goal: make `yardlet` open a usable local workbench.
Deliverables:
- Home screen;
- Workers screen;
- Queue screen;
- Handoff screen placeholder;
- New Work input placeholder;
- status indicators from `.agents` state.
Acceptance:
- user can inspect state without running workers;
- UI does not require API keys;
- UI can run in a local terminal.
### Slice D — Zero-key worker guard
Goal: prevent accidental API billing.
Deliverables:
- worker discovery;
- worker status probes;
- AI-billing env scrub/block;
- clear UI states for unavailable/ambiguous workers;
- no API-key prompt anywhere.
Acceptance:
- missing workers show actionable local-worker setup message;
- detected API-billing env does not get passed silently to workers;
- Yardlet never asks for OpenAI/Anthropic API keys.
### Slice E — Planning gate
Goal: turn a short request into an intent contract and initial queue.
Deliverables:
- New Work screen submits request;
- planning worker packet compiler;
- worker result parser for intent/queue;
- Planning Gate screen for high-level accept/edit;
- question budget enforcement.
Acceptance:
- user can create a new intent from UI;
- planning output is structured and saved;
- questions are high-level and limited.
### Slice F — Worker packet compiler
Goal: compile canonical task state into Codex/Claude-specific packets.
Deliverables:
- Codex task template;
- Claude Code task template;
- packet dry-run view;
- anchors/evidence insertion;
- output contract instructions.
Acceptance:
- same task produces different worker packets;
- packets include intent/scope/validation/output schema;
- packets avoid dumping unrelated context.
### Slice G — Run ledger and worker invocation
Goal: execute one bounded task and record it.
Deliverables:
- run directory creation;
- subprocess invocation;
- worker output log capture;
- result file parsing;
- queue state update;
- run monitor UI.
Acceptance:
- pressing Run Next executes one task;
- artifacts are written under `.agents/runs/<run-id>/`;
- failure/blocked/done states are visible.
### Slice H — Evaluator and compact
Goal: do not trust worker claims blindly.
Deliverables:
- deterministic evaluator;
- validation log capture;
- changed-file scope check;
- checkpoint writer;
- handoff writer;
- Handoff UI populated from artifacts.
Acceptance:
- Yardlet can mark done/failed/blocked based on evidence;
- checkpoint is enough to resume;
- handoff is teammate-readable.
### Slice I — Dogfood and trace-based improvement candidates
Goal: use Yardlet on the local Yardlet repo/infrastructure.
Deliverables:
- seed queue for Yardlet itself;
- run traces;
- prompt/packet improvement observations;
- no automatic self-promotion.
Acceptance:
- at least one real implementation task is completed through Yardlet;
- trace yields improvement candidates;
- no self-patch without review.
---
## 17. Initial queue seed
```yaml
schema_version: 1
queue_id: yardlet-bootstrap-queue
intent_id: yardlet-final-plan
selection_policy:
default_order: priority_then_created_at
require_planning_gate: false
skip_if_blocked: true
skip_if_approval_required: true
tasks:
- id: YARD-001
title: "Create Yardlet product surface and remove user-facing Orcar naming"
state: queued
priority: 10
risk: low
kind: design_refactor
preferred_worker: codex
acceptance:
- "New files use Yardlet as product name."
- "Orcar Brain remains reference-only, not dependency."
- ".agents remains canonical repo state."
- id: YARD-002
title: "Add canonical .agents schemas and templates"
state: queued
priority: 20
risk: medium
kind: implementation
preferred_worker: codex
acceptance:
- ".agents/yardlet.yaml template exists."
- "intent, queue, policy, worker, run, and result schemas exist."
- "A schema validation command exists."
- id: YARD-003
title: "Build minimal Yardlet terminal UI shell"
state: queued
priority: 30
risk: medium
kind: implementation
preferred_worker: codex
acceptance:
- "Running yardlet opens a local terminal UI."
- "Home, Workers, Queue, New Work, Run, Handoff placeholders exist."
- "UI reads .agents state and does not require workers to render."
- id: YARD-004
title: "Implement zero-key worker guard"
state: queued
priority: 40
risk: high
kind: safety
preferred_worker: claude-code
acceptance:
- "Yardlet never asks for AI provider API keys."
- "Worker invocation uses sanitized AI-billing environment."
- "Ambiguous worker auth stops safely."
- "UI shows worker readiness and billing safety state."
- id: YARD-005
title: "Implement planning gate through hidden worker"
state: queued
priority: 50
risk: medium
kind: implementation
preferred_worker: claude-code
acceptance:
- "New Work request creates an intent contract and initial queue."
- "Planning questions are limited and high-level."
- "Planning result can be accepted/edited in the UI."
- id: YARD-006
title: "Implement worker packet compiler"
state: queued
priority: 60
risk: medium
kind: implementation
preferred_worker: codex
acceptance:
- "Codex and Claude Code packets differ by worker profile."
- "Packets include scope, validation, interaction policy, and output schema."
- "Dry-run packet preview is available."
- id: YARD-007
title: "Implement one-task run ledger and evaluator"
state: queued
priority: 70
risk: medium
kind: implementation
preferred_worker: codex
acceptance:
- "Run Next creates .agents/runs/<run-id>."
- "Worker output, result, validation, evaluation, checkpoint, and handoff are recorded."
- "Queue state updates based on evaluator result."
```
---
## 18. Open decisions
Keep these open, but do not block implementation on them.
1. **Implementation language/framework**
Python/Textual, TypeScript/Ink, or Go/Bubble Tea.
2. **Exact worker probe commands**
Adapter should discover installed worker behavior and keep probes isolated.
3. **Strict vs scrub env default**
Strict blocking is safest. Scrubbing is more convenient. The UI can expose the selected policy, but default should never leak billing keys to worker subprocesses.
4. **Core distribution model**
Source checkout vs packaged app vs Homebrew/npm/pipx. The final product should not expose `.orcar-core` as identity.
5. **Web UI later**
Terminal UI first. A web UI can be considered only after local state and worker contracts are stable.
6. **Brain bridge later**
Optional bridge from Orcar Brain/Slack to Yardlet handoffs can exist later, but Yardlet must remain standalone.
---
## 19. Final implementation guidance for agents
When an implementation agent reads this document, it should preserve these decisions:
```txt
Product name: Yardlet
Primary UX: terminal UI / local workbench
User-facing CLI replacement: yes
Internal worker replacement: not yet
Workers: Codex CLI and Claude Code CLI, hidden behind Yardlet
AI API keys: forbidden as a Yardlet requirement
Canonical state: .agents
Brain dependency: no
Orcar Brain patterns: yes, as reference
Intent lock: safety rule, not product identity
Research: allowed, intent-locked
Local tools: free inside sandbox/policy
User questions: few and high-level
Token economy: required
Compact/handoff: required
```
The first useful experience should be:
```txt
1. User runs `yardlet`.
2. Yardlet opens a local terminal workbench.
3. User enters a short work request.
4. Yardlet safely invokes a subscription-backed planning worker.
5. Yardlet shows a product-level planning result.
6. User accepts or edits in natural language.
7. Yardlet queues work.
8. User presses Run Next.
9. Yardlet invokes a hidden worker with a bounded packet.
10. Yardlet validates, records, compacts, and shows handoff.
```
That is the product.