omk 0.5.0

A Rust runtime for Kimi CLI. Turns prompts into proof-backed engineering runs with gates, worktrees, and replay.
Documentation
# North Star Tutorial

This tutorial walks through the **North Star Demo** — the target oh-my-kimi workflow that shows Kimi agents fixing code, producing a proof, and reporting status.

> **Maturity note:** `omk kimi sync`, `omk team run`, `omk hud`, `omk run show`, `omk proof show`, and `omk chat` are in the CLI today. The remaining work is proof/HUD polish and hardening, not command invention. For the tutorial covering only today's CLI surface, see [TUTORIAL.md]TUTORIAL.md.

You can also start from the chat REPL (`omk`) and let the classifier escalate into a full goal run when the request is large enough. The engine pane (`Tab`) shows the same progress that the headless `omk goal run` surface records.

## North Star Commands (Target Workflow)

```bash
omk kimi sync
omk team run 2:executor "fix all failing tests and produce a proof"
omk hud --once
omk proof show latest
```

The demo is successful when you can see Kimi workers progressing in parallel, watch a stuck worker recover or fail cleanly, and inspect a final proof or failure artifact with changed files, gates run, failures, retries, known gaps, and final readiness.

---

## Prerequisites

- **Rust toolchain** (1.78+) for building `omk`
- **Kimi CLI** installed and authenticated (or use `MOCK_KIMI=1` for a fully offline demo)
- **Python 3** (only needed for the wire-compatible mock when running with `MOCK_KIMI=1`)

---

## Quick Start

### 1. Build and install `omk`

```bash
git clone https://github.com/ekhodzitsky/oh-my-kimi
cd oh-my-kimi
cargo build --release
# Optional: copy to a location on your PATH
cp target/release/omk ~/.local/bin/
```

### 2. Run the demo script

```bash
./scripts/north_star_demo.sh
```

For a fully mocked run (no real Kimi API calls):

```bash
MOCK_KIMI=1 ./scripts/north_star_demo.sh
```

Other environment variables you can set:

| Variable | Effect |
|----------|--------|
| `MOCK_KIMI=1` | Use the built-in wire-compatible mock instead of real Kimi |
| `NORTH_STAR_DRY_RUN=1` | Run `omk kimi sync --dry-run` instead of a real sync |
| `NORTH_STAR_NO_CLEANUP=1` | Keep the temp project and team state after the demo |

---

## What the Script Does

### Step 1 — Setup

- Detects the `omk` binary (installed → `target/release/omk``cargo run --`)
- Detects whether to use real Kimi or a mock
- Creates a **temporary Rust project** with an intentional failing test:
  ```rust
  #[test]
  fn test_add() {
      assert_eq!(add(2, 2), 5); // wrong on purpose
  }
  ```

### Step 2 — `omk kimi sync`

Synchronizes OMK Kimi-native assets (agents, hooks, skills) into the current project. With `NORTH_STAR_DRY_RUN=1` this shows what would change without touching files.

### Step 3 — `omk team run`

Launches a team of 2 executor workers with the task *"fix the failing test and make cargo test pass"*.

> **Status:** `omk team run` is the current team runtime. It uses scheduler state and Kimi Wire workers.

What happens under the hood in the target design:

1. **Lead decomposition** — a lead agent breaks the task into parallel subtasks (e.g. "fix the add function", "run cargo test").
2. **Worker dispatch** — each subtask is claimed by a wire worker and written to the worker's `inbox.jsonl`.
3. **Execution** — each worker spawns a `kimi --wire` process, sends the task, and collects results.
4. **Polling & synthesis** — the scheduler polls worker `outbox.jsonl` files, marks tasks complete, and runs a synthesis agent to produce a final summary.
5. **Verification gates**`cargo fmt`, `cargo check`, `cargo clippy`, and `cargo test` are run automatically. With `MOCK_KIMI=1`, the script first proves the fixture fails, then applies a deterministic fixture repair so the offline proof path can finish green.

### Step 4 — `omk hud`

Prints a one-shot JSON snapshot of the team:

```json
{
  "run_id": "north-star-demo",
  "team_name": "north-star-demo",
  "task_summary": {
    "total": 3,
    "completed": 3,
    "running": 0,
    "pending": 0,
    "failed": 0
  },
  "workers": [...]
}
```

### Step 5 — `omk proof show latest`

Reads the run's cached `proof.json` when present, or regenerates a proof report from `events.jsonl`:

> **Status:** `omk proof show` exists in the CLI today. The hardening work is in richer gate reporting, replay, and demo polish.

- **Status**`Ready`, `NotReady`, or `Failed`
- **Changed files** — list of files modified during the run
- **Gates** — verification results (fmt, check, clippy, test)
- **Failures** — any worker or gate failures
- **Retries** — tasks that were retried after stale-lease recovery
- **Known gaps** — explicitly acknowledged incomplete work

`omk proof show` supports all three demo formats:

```bash
omk proof show latest --format text
omk proof show latest --format md
omk proof show latest --format json
```

The demo script validates the JSON verdict and exits non-zero when the final proof `status` is `failed`.

With a real Kimi, the proof should show `Ready` plus the files Kimi actually changed. With `MOCK_KIMI=1`, the script keeps OMK state isolated, repairs the tiny fixture deterministically, and expects `Ready` with passing gates.

### Step 6 — Cleanup

Removes the temporary project and all team state (unless `NORTH_STAR_NO_CLEANUP=1` is set).

---

## Using with Real Kimi

Unset `MOCK_KIMI` and ensure the `kimi` CLI is authenticated:

```bash
kimi --version        # should print version
kimi info             # should show wire protocol 1.9 or the currently supported protocol
kimi auth status      # should show you are logged in
```

Then run the demo without the mock:

```bash
./scripts/north_star_demo.sh
```

> ⚠️ **Cost warning**: running with real Kimi will consume API tokens. The demo creates 2 workers + 1 lead + 1 synthesis agent, each making at least one LLM call.

If you are validating a new Kimi CLI release, first run `kimi info` and compare it with [KIMI_UPSTREAM.md](KIMI_UPSTREAM.md). Extension fields in `initialize.result`, such as `hooks`, can evolve while the protocol remains compatible, so OMK should parse them as structured JSON evidence rather than a closed schema.

---

## Greenfield Goal Demo (MVP Acceptance Artifact)

This is the first narrow greenfield acceptance path for `omk goal`. It starts
from a fresh Rust CLI fixture so OMK has real gates to run. It proves that a
small product-shaped request becomes durable engineering artifacts and proof
evidence; it does **not** claim the result is product-ready without review,
commit, PR, and release acceptance.

### 1. Create a disposable greenfield project

```bash
tmpdir="$(mktemp -d)"
cd "$tmpdir"
cargo new omk-goal-greenfield-demo
cd omk-goal-greenfield-demo
omk setup
```

If you want git evidence in the final proof, keep the fixture in the Git repo
created by `cargo new`. A commit is optional; uncommitted agent changes are
still captured through `git status --porcelain`.

### 2. Create the goal scaffold

```bash
omk goal run \
  "Build a tiny local-only Rust CLI named taskline. It should support add <text> and list commands, store tasks in tasks.txt, include tests for both commands, avoid network access, and add no new dependencies." \
  --budget-time 30m \
  --budget-tokens 200000 \
  --max-agents 1
```

Expected state after this command:

- `omk goal show latest` prints the goal id, status, phase, and state path.
- The state path contains `goal.json`, `prd.md`, `technical-plan.md`,
  `test-spec.md`, `task-graph.json`, `decisions.jsonl`, and `proof.json`.
- `omk goal proof latest --format md` is honest: before execution and review,
  it should remain `not_ready` with known gaps for missing gate or agent
  evidence.

### 3. Attach gate and execution evidence

```bash
omk goal verify latest
omk goal execute latest
omk goal review latest
omk goal proof latest --format md
omk goal replay latest --format text
```

`verify` auto-detects the Rust fixture and records required gate results for
format, check, lint, and tests. `execute` runs the bounded goal-agent wave;
that step requires an authenticated `kimi` CLI, or `MOCK_KIMI` pointing at an
executable wire-compatible mock. `review` adds controller review and security
review artifacts after execution evidence exists.

### Expected Goal Artifacts

The exact state root is printed by `omk goal show latest`. By default it is
under `~/.local/state/omk/goals/<goal-id>/`, or under
`~/.omk/state/goals/<goal-id>/` when the legacy OMK state root exists.

| Artifact | Why it matters |
|----------|----------------|
| `goal.json` | Durable original goal, normalized goal, budgets, terminal criteria, and current status |
| `prd.md` | Human-readable goal brief for the requested product behavior |
| `technical-plan.md` | Controller phases and execution boundary |
| `test-spec.md` | Acceptance gates and scaffold task expectations |
| `task-graph.json` | Controller-owned task graph with dependencies, owners, risk, read/write sets, and task evidence |
| `decisions.jsonl` | Append-only controller decisions for planning, decomposition, and execution boundaries |
| `events.jsonl` | Goal lifecycle, task, gate, proof, and interruption events |
| `artifacts/gates/` | Command output evidence for each verification gate |
| `artifacts/agent-runs/` | Bounded Wire worker inbox/outbox, task policy, and mutation evidence |
| `artifacts/reviews/` | Controller review and security review summaries |
| `proof.json` | Machine-readable readiness proof with gates, changed files, known gaps, and git evidence when available |
| `failure.json` | Written for cancelled or blocked/failure outcomes |

### Gates and Readiness

For this Rust fixture, the default required gates are:

- `format`: `cargo fmt --check`
- `check`: `cargo check --all-targets`
- `lint`: `cargo clippy -- -D warnings`
- `tests`: `cargo test`

The demo reaches an **engineering-ready handoff candidate** when the proof shows
passing required gates, changed-file evidence, bounded agent execution evidence,
controller review evidence, and no blocking security findings.

The demo is **not product-ready** until a human or integrator accepts the diff,
commits it, opens or merges a PR, updates release-level docs when appropriate,
and decides that the local CLI behavior is useful enough for users. Current
`omk goal` proof output intentionally remains `not_ready` when that integration
acceptance is missing.

---

## Troubleshooting

### "No teams found" when running `omk hud` or `omk proof`

You need to run `omk team run` first so that team state exists on disk. The demo script does this automatically.

### "Dead workers" in HUD output

- Check that `kimi --version` works.
- If using `MOCK_KIMI=1`, verify Python 3 is available (`python3 --version`).
- Check `~/.local/state/omk/team/<name>/workers/*/heartbeat.json` for worker status.

### "Empty proof" or "No events found for run"

- The run may have failed before writing events. Check the run output for errors.
- With `MOCK_KIMI=1`, the wire mock may have crashed. Look at `events.jsonl` in the team state directory for malformed lines.
- If gates failed, the proof status will be `Failed` rather than `Ready` — this is still a valid proof, it just means the work did not pass verification.

### `omk team run` hangs

The scheduler waits for workers to complete. With real Kimi, workers can take minutes. With the mock, it should finish in under 10 seconds. If it hangs:

- Check `~/.local/state/omk/team/<name>/workers/*/inbox.jsonl` — tasks should be written there.
- Check `~/.local/state/omk/team/<name>/workers/*/outbox.jsonl` — results should appear there.
- Check `~/.local/state/omk/team/<name>/events.jsonl` — events track the run lifecycle.

### Real Kimi fails during `initialize`

- Rebuild OMK after Wire protocol changes: `cargo build --bin omk`.
- Check the local protocol report: `kimi info`.
- Run a minimal handshake outside the demo and inspect whether `initialize.result` has new extension fields.
- Record upstream drift in [KIMI_UPSTREAM.md]KIMI_UPSTREAM.md before changing runtime parsing.

### `cargo test` in the temp project does not fail

The script creates a fixture with `assert_eq!(add(2, 2), 5)`. If your Rust version or test runner formats output differently, the grep check may miss it. The script warns and continues — the fixture itself is still correct.

---

## File Reference

| File | Purpose |
|------|---------|
| `scripts/north_star_demo.sh` | The demo script (this tutorial's companion) |
| `~/.local/state/omk/team/<name>/events.jsonl` | Event log driving HUD and proof |
| `~/.local/state/omk/team/<name>/event-log.jsonl` | Compatibility read alias when the canonical event log is absent |
| `~/.local/state/omk/team/<name>/workers/*/inbox.jsonl` | Tasks dispatched to each worker |
| `~/.local/state/omk/team/<name>/workers/*/outbox.jsonl` | Results returned by each worker |
| `~/.local/state/omk/team/<name>/proof.json` | Cached proof report |
| `~/.local/state/omk/team/<name>/failure.json` | Failure summary emitted for failed, not-ready, or interrupted runs |

---

## Next Steps

- Read the full [TUTORIAL.md]TUTORIAL.md for the current CLI surface: `team run`, autopilot, skill management, and `omk kimi sync`.
- Read [ARCHITECTURE.md]ARCHITECTURE.md to understand how the scheduler, wire protocol, and proof system fit together.
- Read [SPEC.md]../SPEC.md for the product roadmap and design decisions.