omk 0.5.0

A Rust runtime for Kimi CLI. Turns prompts into proof-backed engineering runs with gates, worktrees, and replay.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
# OMK Goal Product Spec

`omk goal` is the north-star feature for oh-my-kimi.

It turns OMK from a set of useful orchestration commands into a long-running,
proof-driven engineering runtime. The command accepts a high-level outcome,
builds an evidence-backed plan, launches agents and subagents under policy, and
keeps working until the goal is ready, blocked, or out of budget.

End-to-end delivery contract is defined in `ROADMAP.md` Stage 7 and enforced by the
`omk goal` CLI surface (`run`, `status`, `verify`, `execute`, `review`, `open-pr`, `proof`).

## Product Thesis

Progress is powered by laziness: users should be able to express intent once and
let OMK do the tedious engineering work.

The product promise is not "generate lots of code." The product promise is:

> Work autonomously until the requested engineering goal is proof-backed ready,
> or produce a precise, actionable reason why it is not ready.

`omk goal` must be allowed to run for hours or days, but it must not be allowed
to claim success without evidence.

## End-to-End Delivery Contract

The north-star `goal` flow is end-to-end delivery, not only local proof. With
explicit delivery/merge policy, the controller may decompose work, create
task-scoped teams, branches, worktrees, commits, PRs, review/fix loops, and
integrator PRs under the hood. The primary UX remains one lazy command and a
terminal-native/TUI-first view of the orchestrator's progress.

A goal is not fully `ready` in this mode until required slices are reviewed,
audited, verified, integrated, and merged into `main`/`master`, or stopped with
precise blocker evidence. Dry-run proof, unmerged branch evidence, and PR drafts
are useful intermediate output, not the complete end-to-end promise.

## Must-Have Positioning

Canonical market map: `docs/COMPETITIVE_POSITIONING.md`.

`omk goal` must be positioned as a local, repo-native, proof-driven autonomous
software engineering runtime. It is not a generic AI app builder, IDE
autocomplete product, hosted coding-agent clone, or unbounded recursive agent
launcher.

The direct competitive set is Devin, OpenHands, and Claude Code. Aider, Dify,
and Cody are adjacent benchmarks. OMK should learn from the category while
competing on durable goal state, explicit task graphs, verification gates, and
proof artifacts.

## Example Commands

```bash
omk goal run "Build a production-ready CLI for managing local LLM costs" --until-ready
omk goal run "Rewrite this Python project in Rust" --until-ready --budget-time 7d --budget-tokens 2000000 --budget-usd 25
omk goal status
omk goal show latest
omk goal verify latest
omk goal execute latest
omk goal review latest
omk goal open-pr latest --dry-run --format markdown
omk goal replay latest
omk goal budget latest
omk goal pause latest
omk goal resume latest
omk goal cancel latest
```

## Scope

`omk goal` covers large engineering outcomes:

- greenfield products;
- rewrites and migrations;
- large refactors;
- bug-fix campaigns;
- security hardening;
- performance work;
- documentation and release readiness.

It does not replace human product judgment. When a goal depends on taste,
business strategy, credentials, paid APIs, or ambiguous acceptance criteria, the
correct outcome is `blocked_on_human`, not a fake success.

## Current Foundation

`omk goal` now has a current controller scaffold, but it should reuse the
current beta MVP instead of inventing a parallel runtime:

- durable `goals/<goal-id>/goal.json` creation under the OMK state directory;
- backward-compatible `goal.json` loading with safe defaults for newer fields
  and `state_dir` rehoming from the actual goal directory;
- `omk goal plan/run/list/status/show/proof/open-pr/replay/budget/budget-add/verify/execute/review/accept/reject/pause/resume/cancel`;
- scaffold `prd.md`, `technical-plan.md`, `test-spec.md`,
  `task-graph.json`, and `decisions.jsonl`;
- human-blocked oracle guard that stops vague goals as `blocked_on_human` when
  success criteria cannot be made testable without a human decision;
- controller-owned planning task completion evidence in the task graph and
  goal event log;
- durable task graph retry/lease metadata through `retry_count`, `max_retries`,
  and `lease_expires_at`, with backward-compatible defaults for older graphs;
- controller-owned decision records in `decisions.jsonl` for planning,
  decomposition, and execution-boundary rationale;
- honest goal-level `proof.json` with `ready` only after gates, execution,
  review, explicit integration acceptance, and oracle evidence pass;
- local verification gate execution through `omk goal verify`, with gate output
  artifacts and gate results embedded in the goal proof;
- local controller execution through `omk goal execute`, which:
  - marks `goal-local-verify` done when required gates pass;
  - launches policy-validated bounded Wire-backed agent task waves with
    mutation diff and changed-file evidence;
  - dispatches accepted agent-proposed follow-up tasks on later invocations;
  - enforces `max_agents` as the worker pool cap;
  - recovers expired task leases with `retry_scheduled` evidence, preferring a
    different available worker over the stale owner;
  - quarantines stale workers with `worker_dead` evidence and durable
    `stale-worker-cleanup.json` markers, ignoring late stale-worker
    outbox/heartbeat updates;
  - reruns verification gates against the mutated tree when agent work changes
    project files;
- active operator interruption during Wire-backed goal execution: `pause` or
  `cancel` updates durable goal state, the active execute process observes the
  state change, cancels workers, prevents additional task dispatch, and
  preserves the interrupted goal/proof status;
- first-class `task_graph_mutated` events for accepted agent-proposed graph
  additions, including the task id, source, proposal artifact, graph path, and
  resulting task count;
- load-time task graph validation for duplicate task ids, missing dependencies,
  self-dependencies, empty required task fields, and dependency cycles;
- controller policy checks that reject unordered agent-proposed follow-up tasks
  with conflicting normalized, alias-equivalent, parent/child, or read/write
  access sets while accepting dependency-serialized follow-ups;
- controller review through `omk goal review`, which marks `goal-review` and
  `goal-security-review` done only when execution evidence exists and the
  bounded changed-file secret scan finds no high-confidence findings;
- structured `proof.json.review_artifacts` with deterministic architect, code,
  test, security, performance, and anti-slop sections; each section carries
  status, evidence, risks, known gaps, and a recommended next step for PR
  readiness;
- explicit local integrator acceptance/rejection through `omk goal accept` and
  `omk goal reject`, recorded as `proof.json.integration_evidence`;
- oracle evidence in `proof.json.oracle_evidence`; current greenfield ready
  paths require acceptance/smoke/demo gates, and rewrite/refactor ready paths
  require compatibility/golden gates before local integrator acceptance can
  produce `ready`;
- GitHub PR draft rendering through `omk goal open-pr`, which turns existing
  proof evidence into Markdown, JSON, or text title/body output without network
  access and blocks scaffold-only proofs with an actionable next step;
- best-effort git branch, HEAD commit, and dirty-state capture in goal proofs;
- bounded agent wave evidence under `artifacts/agent-runs/`;
- structured per-task budgets carried into Wire worker inboxes and enforced as
  task timeout hard stops with failed-result evidence;
- Wire-derived token usage and estimated USD cost budget accounting, with
  `--budget-tokens` / `--budget-usd` hard stops before the next controller step
  and `budget-add --tokens` / `--usd` recovery;
- goal-level `events.jsonl` plus deterministic `omk goal replay` output derived
  from persisted event/task state instead of the current process clock;
- cancellation `failure.json` artifacts;
- Kimi-native asset sync, doctor, install, and rollback;
- scheduler-backed `omk team run`;
- Wire worker control through `kimi --wire`;
- task claims, leases, retries, and write-set conflict checks;
- append-only event logs;
- verification gates;
- run/proof/HUD inspection;
- `proof.json` and `failure.json` artifacts.

The current foundation is documented in `README.md`, `docs/ARCHITECTURE.md`,
and `docs/PROJECT_MAP.md`.

## Core Outcomes

Every goal run ends in exactly one terminal status:

| Status | Meaning |
| --- | --- |
| `ready` | Required gates passed and the proof bundle supports the readiness claim. |
| `not_ready` | Work was attempted, but required proof or gates did not pass. |
| `blocked_on_human` | A human decision is required before progress can continue safely. |
| `blocked_on_external` | External access, credentials, APIs, or services are missing. |
| `needs_more_budget` | Time, token, cost, or compute budget was exhausted. Current runtime enforces exhausted wall-clock `--budget-time`, Wire-derived `--budget-tokens`, and estimated `--budget-usd` before `verify`, `execute`, or `review`; Wire workers enforce per-task budget timeouts; `omk goal budget-add` records operator-approved recovery for time, tokens, and USD. |
| `failed_infra` | OMK infrastructure failed in a way the run could not recover from. |
| `cancelled` | User cancelled the goal. |

## Non-Negotiable Principles

1. **Proof over confidence.** Agents may propose completion; only verifiers and
   gates can accept completion.
2. **Oracle first.** Rewrites need compatibility tests. Greenfield work needs
   acceptance tests. Security/performance work needs explicit gates.
3. **Bounded autonomy.** Agents can request tasks and subagents, but the goal
   controller enforces policy, budgets, write scopes, and concurrency limits.
4. **No silent branching.** Material product or architecture choices are logged
   as decisions. Human-blocking decisions stop with `blocked_on_human`.
5. **Recoverable by default.** Goal state, task graph, messages, heartbeats,
   artifacts, and proofs must survive process crashes and context compaction.
6. **Small accepted increments.** Long goals are completed through accepted
   subgoals, not one giant unreviewable diff.
7. **Local-first.** OMK owns local state and execution; GitHub integration is an
   output surface, not the source of truth.
8. **PR-first integration.** Repository changes are integrated through
   task-scoped branches/worktrees and PRs; `master` / `main` are read-only
   execution baselines.

## Functional Requirements

### Goal Intake

- Accept a natural-language goal and optional constraints.
- Classify the goal as greenfield, rewrite, migration, refactor, audit, bugfix,
  performance, documentation, or mixed.
- Inspect the repository before planning.
- Create a goal directory under `.omk/goals/<goal-id>/`.
- Persist the original user request, normalized goal, assumptions, constraints,
  budgets, and terminal criteria.

### Planning

- Produce a PRD or goal brief.
- Produce a technical plan.
- Produce a test specification.
- Build a task graph with dependencies, read sets, write sets, risk level,
  acceptance criteria, retry counts, retry policy, and lease expiration for
  each task.
- Identify the oracle that will decide whether the goal is done.
- Stop early with `blocked_on_human` if the oracle cannot be defined.

### Agent Orchestration

- Launch role-specific agents and subagents through the existing OMK team/runtime
  surfaces.
- Assign bounded tasks with explicit ownership.
- Allow agents to propose follow-up tasks.
- Require the goal controller to approve task graph mutations.
- Track worker leases, heartbeats, retries, and failure evidence.
- Support long-running execution with resume after process and active-worker
  interruption.

### Research

- Search official docs and relevant repositories when the goal depends on
  libraries, frameworks, APIs, security practices, or migration strategies.
- Record sources and decisions in the goal decision log.
- Prefer official documentation and primary sources for implementation facts.

### Implementation

- Use isolated worktrees or branches for independent slices.
- Tie each independent slice to a goal task with owner, write scope,
  dependencies, verification gates, branch, and PR link.
- Keep write scopes explicit.
- Merge accepted slices through an integrator step.
- Preserve changelog, docs, migration notes, and release notes as part of done.
- Avoid new dependencies unless a recorded decision justifies them.

### Collaboration and Delivery

- Treat the local goal task graph plus GitHub PRs as the durable collaboration
  surface for humans and agents.
- Require every agent to record its task ownership and write scope before
  editing files.
- Block or serialize overlapping write scopes through task dependencies or an
  integrator PR.
- Open PRs from task-scoped branches; include proof, gates, known gaps, and
  decision artifacts in the PR body.
- Render PR drafts from local proof evidence first with `omk goal open-pr
  latest --dry-run --format markdown|json|text`; GitHub creation must remain an
  explicit integration action, never an implicit side effect of proof rendering.
- Mark task slices integrated only after the PR is merged or explicitly
  rejected.

### Operator Experience

The happy path is one command, `omk goal run "<task>" --until-ready`, or a
future equivalent shorthand. Prefer TUI/terminal output before graphical UI.
Show a live orchestrator narrative: implemented work, current verification,
next step, blockers, and material tradeoffs. A glance should reveal whether the
goal is planning, implementing, reviewing, fixing, merging, blocked, or ready.

### Verification

The verification wall is configurable, but the default Rust profile includes:

- `cargo fmt -- --check`
- `cargo check --all-targets`
- `cargo clippy --all-targets --all-features -- -D warnings`
- `cargo test --all-features`
- `cargo doc --no-deps`
- dependency and license audit when configured

Additional gates are selected by goal type:

- rewrite: compatibility and golden tests against the original implementation;
- greenfield: acceptance, smoke, and demo tests;
- security: threat model, secret scan, dependency audit, abuse-case checks;
- performance: baseline and regression benchmarks;
- frontend: browser QA, responsive screenshots, accessibility checks.

### Proof Bundle

Each goal writes `.omk/goals/<goal-id>/proof.json` with:

- terminal status;
- goal summary;
- accepted and rejected assumptions;
- task graph summary;
- controller-owned task evidence and bounded agent execution evidence;
- changed files;
- commits or branches produced;
- current git HEAD, branch, and dirty state when available;
- gates run and outputs;
- test results;
- reviews performed;
- security/performance notes;
- structured specialist review wall sections for architect, code, test,
  security, performance, and anti-slop evidence;
- oracle evidence and integration acceptance/rejection evidence;
- known gaps;
- human decisions required;
- links to artifacts.

### Commands

Initial command surface:

```bash
omk goal run <goal> [--until-ready] [--budget-time <duration>] [--budget-tokens <n>] [--budget-usd <usd>] [--max-agents <n>]
omk goal status [goal-id|latest]
omk goal show [goal-id|latest] [--format text|json|md]
omk goal list
omk goal pause [goal-id|latest]
omk goal resume [goal-id|latest]
omk goal cancel [goal-id|latest]
omk goal proof [goal-id|latest]
omk goal open-pr [goal-id|latest] --dry-run [--draft] [--format markdown|json|text]
omk goal replay [goal-id|latest] [--format text|json|md]
omk goal budget [goal-id|latest] [--format text|json|md]
omk goal budget-add [goal-id|latest] [--time <duration>] [--tokens <n>] [--usd <usd>]
omk goal verify [goal-id|latest]
omk goal execute [goal-id|latest]
omk goal review [goal-id|latest]
omk goal accept [goal-id|latest] --summary <text>
omk goal reject [goal-id|latest] --reason <text>
omk goal merge [goal-id|latest]
```

Future command extensions are defined by the end-to-end delivery contract.

## MVP Definition

The first usable `omk goal` MVP is not "rewrite any 200k line project."

It is:

- one durable goal state directory;
- PRD, technical plan, and test spec artifacts;
- task graph persisted as JSON;
- limited agent execution through the existing team runner;
- status/pause/resume/cancel/budget;
- proof bundle;
- one greenfield demo;
- one rewrite/refactor demo using a small fixture;
- CI coverage for state transitions and proof statuses.

## State Layout

```text
.omk/goals/<goal-id>/
  goal.json
  prd.md
  technical-plan.md
  test-spec.md
  task-graph.json
  decisions.jsonl
  events.jsonl
  heartbeats/
  artifacts/
    gates/
    agent-runs/
  reviews/
  proof.json
  failure.json
```

## Open Risks

- Agents can produce plausible but wrong code when the oracle is weak.
- Long-running goals can waste budget if task graph mutation is unconstrained.
- Parallel work can conflict without strong write-set enforcement.
- Security work needs explicit threat modeling, not only dependency scans.
- Product correctness cannot be fully automated without real-world feedback.

## Success Criteria

`omk goal` is successful when a user can start a large goal, leave the machine,
return later, and inspect a trustworthy state:

- what was attempted;
- what changed;
- what passed;
- what failed;
- what was not tested;
- what needs a human decision;
- whether the result is ready to merge or release.