beck 0.2.0

Local skills router CLI for AI agents. Your agent's skills, at its beck and call.
Documentation
# beck - session status (2026-04-08, end of day)

This file is the handoff. Read this first in the next session and you
have every decision, every number, every open item.

---

## TL;DR for your next self

Phases 0, 1, and 4 are shipped, pushed, and verified against real
agents. Phase 5a (production README) is committed locally on
`feat/beck-mvp` but not yet pushed pending your review of a minor
comparison-table wording nit. Phase 5b (install.sh, CI, release.yml,
Homebrew tap) is not started. The MCP server is live in Codex and
Claude Code with screenshot proof. Crates.io has `beck 0.0.1`
published under your `notabotchef` GitHub identity. GitHub primary is
`notabotchef/beck`, GitLab backup is `knifecode/beck`, and
`Nunezchef` has been fully discarded.

Your approximate cost for this session: **~$417 across 618 assistant
turns** (166M cache-read tokens dominating). A fresh session with
`/compact` discipline is much cheaper than continuing this one.

---

## Canonical URLs

| | |
|---|---|
| Public repo | **https://github.com/notabotchef/beck** |
| Backup mirror | `git@gitlab.com:knifecode/beck.git` (private) |
| Published crate | https://crates.io/crates/beck (0.0.1 placeholder) |
| Profile | https://github.com/notabotchef |

---

## Phase state

| phase | scope | state |
|---|---|---|
| **0** | eval gate | **SHIPPED.** Top-3 98.0%, top-1 92.0%, 118 fixtures + 50 queries. |
| **1** | 7 clap commands + on-disk DB | **SHIPPED.** 2.0 MB binary. `beck sync` 805 ms cold, 4 ms hot query. |
| **4** | MCP server (tools only) | **SHIPPED.** `skills_query` + `skills_load`. Fixed capabilities advertisement bug on 2026-04-08. Integration test 7/7. |
| **5a** | production README | **COMMITTED LOCAL, NOT PUSHED.** See `git log c96460a` on `feat/beck-mvp`. Review notes below. |
| **5b** | install.sh, ci.yml, release.yml, Homebrew tap | **NOT STARTED.** Planned as a second `rune:cook` subagent brief. |
| **6** | tag v0.1.0, publish, Show HN | BLOCKED on 5a review + 5b. |

## Git state

```
main           94223ff  chore(main): flip canonical URL to github.com/notabotchef/beck
feat/beck-mvp  2eebd2d  docs(handoff): end-of-day STATUS + claude-tokens script
               c96460a  docs: production README for v0.1.0 launch     ← local only
               0fbb103  test(mcp): regression check for capabilities.tools
               102fb72  fix(mcp): advertise tools capability in initialize response
               3f7930d  feat(phase-4): real MCP server, tools-only
               fd94a99  chore: flip canonical URL (feat branch)
               bab1e81  chore: ignore .rune/ session metrics
               5f727f6  feat(phase-1): single-crate beck binary, 6/7 commands live
               e8c6b84  feat(phase-0): eval gate passes at 98%
```

`main` has the 0.0.1 placeholder. `feat/beck-mvp` has the real work.
Both remotes (`github`, `gitlab`) are byte-identical through `2eebd2d`.
The README commit `c96460a` is only on your local `feat/beck-mvp` and
is NOT pushed yet. Push it after you decide on the review notes.

## Open Phase 5a review items

The rune:cook subagent wrote a 263-line README at the repo root. I
read it end to end. It is solid. Three items I flagged for you
before pushing:

1. **The "Why you want this" comparison table is slightly off.** The
   label "beck CLI path (shell out per query)" is attached to math
   (`~50 tokens × 300`) that actually describes the v0.2 `beck stub`
   scenario, not the v0.1 CLI scenario. Suggested fix: relabel the
   middle row to "beck stub (v0.2, opt-in)" and keep the 15k number,
   or delete the row and show only the two-tier comparison (status
   quo vs beck MCP). Either way this is a 60-second Edit, not a
   re-cook.

2. **The CI badge on line 7** points at `.github/workflows/ci.yml`
   which doesn't exist yet (Phase 5b creates it). Expect the badge
   to render as an error placeholder until Phase 5b lands. Harmless.

3. **The chef grace note at the bottom** ("half-plated") is one line.
   You liked this kind of voice earlier. If you decide you want more
   or less of it, it's the only reference in the whole file and easy
   to strip or expand.

Push decision options for the next session:
- Fix the table inline, push.
- Push as-is, fix in a follow-up.
- Read the whole thing first, decide after.

---

## Real token cost analysis from THIS session

You asked "show me real tokens beck would cut from this session." I
ran the numbers directly against the session jsonl. Here they are,
honest.

### Raw totals (via the `claude-tokens` script, see below)

```
session turns                   618
input tokens                  4,020
output tokens               259,042
cache creation            7,386,345
cache read              172,702,071
TOTAL                   180,351,478

first-turn static            35,347  (static system prompt at session start)
latest-turn static          540,074  (current per-turn cached context)

approx Opus standard cost:  ~$417
```

### How much beck would have cut, honestly

Beck replaces the **skill catalog portion** of the static system
prompt. In Claude Code that portion is roughly 5,000-10,000 tokens
out of the 35,347 initial cache creation. The rest of the 35k is
Claude Code's meta prompt, tool definitions, CLAUDE.md, memory, and
MCP instructions, none of which beck addresses.

- Conservative (5k/turn × 618 turns): ~3.1M cache_read saved ≈ **$4.60**
- Aggressive (10k/turn × 618 turns): ~6.2M cache_read saved ≈ **$9.25**

**As a fraction of the $417 total: 1.1% to 2.2%.**

This is NOT the dramatic number the launch pitch promises, and the
reason matters:

1. **Claude Code already does lazy loading.** It does not eagerly
   inject every skill body into the system prompt, the way older
   Hermes-style agents do. Claude Code's skill catalog block is
   ~5-10k, not 150k. Beck's marketing number (150k → 200 tokens) is
   against agents that do NOT lazy-load, which is where the 99%
   saving comes from. Against Claude Code, beck's improvement is
   modest (5-10k → 200 per turn).

2. **Conversation history dwarfs the catalog in long sessions.** Your
   current per-turn cached context is 540k tokens, of which ~10k is
   skill catalog and ~530k is file reads, tool results, diffs, the
   whole conversation. Beck does not touch the 530k. It only cuts
   the fixed ~10k.

3. **618 turns is 10x a normal session.** In a fresh 60-turn session
   with ~100k per-turn context, the same ~10k catalog savings is
   ~10% of the total, not 1-2%. beck's percentage impact is bigger
   on shorter sessions.

### What actually cost money in this session

```
cache re-reads (conversation history growing)   ~$259   62%
cache creation (new file reads + subagents)     ~$139   33%
output tokens (my writing)                       ~$19    5%
```

**The biggest cost lever in a session like this one is NOT shipping
beck. It is:**

- Ending the session sooner and starting fresh (resets the cache).
- Using `/compact` midway to collapse old context.
- Spawning fewer subagents (each one creates its own ~15-35k cache).
- Using Grep instead of full Read when you only need one function.

### Where beck IS dramatic

- **Hermes-class agents** that don't lazy-load at all: 150k → 200,
  99.9% off.
- **Fresh 60-turn sessions**: catalog is a larger fraction of total,
  beck saves ~10%.
- **MCP-aware agents with many skills**: beck flat 200 tokens vs
  client-side eager tool lists that grow linearly with skill count.
- **Power users running many parallel sessions**: savings compound
  per session.

---

## The `claude-tokens` script (new, committed)

I wrote a tool to answer the question "how many tokens get injected
right after the first message" for any session. It reads Claude
Code's local jsonl transcripts directly, no API call.

```
# installed at ~/.local/bin/claude-tokens (symlink to scripts/)

claude-tokens              # most recent session, summary
claude-tokens first        # first assistant turn of the most recent session
                           # (this is the "injected right after my first message" answer)
claude-tokens <session-id> # specific session by uuid
claude-tokens summary      # totals for the most recent session
```

### To answer your original question

In a **new** Claude Code session, send your first message, wait for
one reply, then run in a terminal:

```
claude-tokens first
```

Output looks like:

```
=== first assistant turn ===
this is what got 'injected' before your first reply:

  input tokens                     3   (your own message)
  cache creation               8,579   (initial system prompt, written to cache)
  cache read                  12,611   (re-read from previous cache, if any)
  output tokens                    5   (the assistant reply)

  STATIC SYSTEM PROMPT        21,190 tokens
  ^ this is the 'tax' you pay before saying anything. tools + skills + CLAUDE.md + memory.
```

The `STATIC SYSTEM PROMPT` number is exactly what you want: what
Claude Code loaded into context before the model even read your
message. For a paperclip-server project I opened while we were
talking, it was **21,190 tokens**. For this beck session it was
**35,347 tokens**. The delta comes from rune skills cached state,
the larger CLAUDE.md in beck, and subagent context.

---

## Reminders

1. **Crates.io token rotation.** Still not done. The token you pasted
   earlier today (`cio2Zht...`) is readable in this conversation's
   history. Revoke at https://crates.io/me → API Tokens, issue a new
   one for any future publish. You confirmed you did not want to
   rotate during the session; handle it whenever you do the 0.1.0
   publish.

2. **Claude Desktop end-to-end verification.** Only verified in
   Codex and Claude Code with working screenshots. Claude Desktop
   config is written (`~/Library/Application Support/Claude/claude_desktop_config.json`)
   and expected to work since the capabilities-advertisement fix
   addressed the same bug both strict clients had. Worth a five
   minute test.

3. **Session restart for Claude Code plugin cache.** When this
   session started, the rune-kit plugin cache was at v2.6.0. We
   pulled rune to v2.9.0 during the session. The CURRENT session is
   still running on the cached v2.6.0. Your next Claude Code session
   will hot-cache v2.9.0 automatically. Nothing to do here, just
   know why rune:cook briefs from the next session may feel slightly
   different.

---

## How to pick up in a new session

Paste this exact prompt at the start of your next Claude Code session
in `~/Projects/beck`:

```
We are resuming work on beck. Read STATUS.md and the last 10 lines of
git log on feat/beck-mvp. Confirm: phases 0, 1, 4 shipped; 5a (README)
committed locally at c96460a not yet pushed; 5b not started. My
immediate priorities are:
  1. Review the Phase 5a README and either push it as-is or fix the
     "Why you want this" comparison table first.
  2. Dispatch Phase 5b as a rune:cook subagent (install.sh, ci.yml,
     release.yml, Homebrew formula).
  3. After 5b returns, tag v0.1.0 and publish.
Do not write any code inline. If a task is more than ~30 lines of
code, spawn a rune:cook subagent per the feedback memory at
~/.claude/projects/-Users-estebannunez-Projects-beck/memory/feedback_rune_cook_threshold.md.
```

That resumes with full context in under 200 tokens of input.

---

## Phase 5b brief outline (for the next session)

Dispatch as a single `rune:cook` subagent. Bundle because the files
interact (formula depends on release tarball names, install.sh
depends on those same names). One agent sees the whole surface.

Deliverables (all on `feat/beck-mvp`, committed not pushed):

1. `scripts/install.sh` - curl|sh installer. Detect OS+arch (macOS
   Intel, macOS aarch64, Linux x86_64 gnu, Linux x86_64 musl, Linux
   aarch64 gnu). Download the right tarball from GitHub Releases.
   Verify sha256 against embedded SHA256SUMS. Install to
   `/usr/local/bin/beck` or `~/.local/bin/beck`. set -eu, mktemp,
   trap cleanup, reject on any error. ~100 lines of bash.

2. `.github/workflows/ci.yml` - 5-cell matrix build: ubuntu-22.04
   gnu, ubuntu-latest musl, ubuntu-latest aarch64-gnu, macos-13
   x86_64, macos-latest aarch64. Each cell: cargo build --release,
   cargo test, strip + size gate (<6 MB). Separate jobs for cargo
   fmt --check, clippy -D warnings, cargo deny. Uses
   dtolnay/rust-toolchain@stable and actions/cache@v4. Model on
   mateonunez/nucleo's flow but with the 5th musl cell added.

3. `.github/workflows/release.yml` - two-phase workflow_dispatch
   trigger (bump: patch|minor|major). Build all five targets
   fail-fast. Then bump Cargo.toml version, commit, tag, create GH
   release with all artifacts, upload SHA256SUMS, cargo publish. No
   force pushes, no destructive ops. Copy the nucleo skeleton and
   adapt.

4. `Formula/beck.rb` - Homebrew formula for the `notabotchef/beck`
   tap. Standard Rust CLI template with on_macos/on_linux and
   on_arm/on_intel blocks pointing at GitHub release tarballs,
   sha256 pinned, single `bin.install "beck"`, test asserts
   `beck --version` contains "beck".

5. Update README.md badge URLs to point at the real workflow files
   after they exist.

### Brief-writing notes for rune:cook

- Absolute path to beck binary inside install.sh and formula is NOT
  needed; they install TO that path, they don't call it.
- SHA256SUMS must be generated during release.yml, signed if you can
  figure out a key, embedded in install.sh via version bump script.
- The `notabotchef/homebrew-beck` repo does NOT exist yet. Phase 5b
  should NOT try to create it; document the manual gh repo create
  step in a NOTES block at the bottom of the brief.
- Brief should include the exact nucleo release.yml as reference
  (the agent can read `/tmp/nucleo-recon/.github/workflows/release.yml`
  which is still on disk from when we cloned it earlier today, or
  re-clone).
- Verify: `bash -n scripts/install.sh` passes syntax check, YAMLs
  parse, Formula.rb parses with `brew style` if available.

---

## Files on disk referenced by this document

- `~/Projects/beck/` - the repo
- `~/.local/bin/beck` → symlink to `~/Projects/beck/target/release/beck`
- `~/.local/bin/claude-tokens` → symlink to `scripts/claude-tokens.sh`
- `~/Library/Application Support/beck/skills.db` - SQLite index, 153
  unique skills currently
- `~/.claude/projects/-Users-estebannunez-Projects-beck/` - session
  transcripts (claude-tokens reads these)
- `/tmp/nucleo-recon/` - your brother's nucleo checkout (kept around
  for Phase 5b reference, safe to delete after)
- `~/.claude/projects/-Users-estebannunez-Projects-beck/memory/` -
  persistent memory including the rune:cook threshold feedback

End of status.