# skill-inject — Plan
> Working name: **skill-inject** (binary `ski`). Name is provisional — see Open Questions.
Local-first, model-agnostic **automatic skill injection** for Claude Code and
opencode. A hook embeds the user prompt locally, ranks it against skill
descriptions, and — when a skill is relevant — either **tells the model to load it**
(`directive` mode, v1 default) or **injects the `SKILL.md` body directly**
(`body` mode), with **no cloud call for the decision**. We also track which skills
the **model loaded on its own** so we never re-inject a choice it already made. The
model still chooses which *files* a skill points to; we only guarantee the skill
itself is considered when relevant.
---
## 1. Goal / Non-goals
### Goal
- Make relevant skills appear in context **automatically**, driven by a **local**
semantic match (no API call to Claude/Haiku for the decision).
- **Respect the model's own choices:** track skills the model already loaded
(native chooser / `use_skill`) and never re-inject them (§4.8).
- Work for both Claude Code (strong native chooser, used as fallback/refinement)
and opencode with **local models** (weak chooser — primary motivation).
- One ranking engine, two thin adapters. Single offline binary.
- rtk-style setup: `ski init -g` / `ski init -g opencode`.
### Non-goals
- Not a vector DB / RAG-over-codebase tool. N(skills) is tiny (10s–100s); cosine
over an in-memory array beats running qdrant **or chroma** — both add a
Python/server dependency that breaks the single-binary, offline design. If we ever
outgrow in-memory (1000s+ skills), step to an **embedded Rust ANN**
(`usearch` / `hnsw_rs`): in-process, no server — still not a DB. (See [[2-background]].)
- Not replacing progressive disclosure — augmenting it for weak choosers.
- Not authoring skills, not editing skill files.
---
## 2. Background — what exists, and the gap {#2-background}
| Claude Code skills (native) | model reads always-present `name`+`description`, decides | model self-loads `SKILL.md` | reliable ~20% w/o nudge; needs the model to decide |
| `joshuadavidthomas/opencode-agent-skills` | local semantic *similarity* | injects `<available-skills>` + **prompt encouraging** agent to call `use_skill` | still routes through a **tool call + model agency** — not automatic content injection |
| `jefflester/claude-skills-supercharged` | **Haiku** intent analysis + keyword fallback | `UserPromptSubmit` injects skill, once/conversation, 1h intent cache | decision uses a **cloud model**; we want it fully local |
**Our differentiator:** local embedding decision **+ direct content injection** (no
`use_skill`, no model agency in the loop). Borrow the good parts: supercharged's
inject-once tracking and intent cache; opencode-agent-skills' SDK injection
plumbing.
---
## 3. Architecture
```
┌─────────────────────────────┐
user prompt ───────▶│ ADAPTER (per host) │
│ • Claude: UserPromptSubmit │
│ hook → binary (stdin) │
│ • opencode: TS plugin → │
│ binary (spawn) │
└──────────────┬──────────────┘
│ JSON {prompt, session_id, cwd, host}
▼
┌─────────────────────────────┐
│ ski core (Rust, 1 binary) │
│ │
│ 1. load INDEX (skill vecs) │◀── INDEX cache (persistent,
│ 2. embed(prompt) fastembed │ invalidated by mtime/hash)
│ 3. hybrid score: │
│ cosine + keyword boost │
│ 4. threshold + top-K │
│ 5. session dedup │◀── SESSION state (per session_id)
│ 6. read SKILL.md bodies │
│ 7. emit injection JSON │──▶ DECISION cache (TTL, optional)
└──────────────┬──────────────┘
│ JSON {inject: "<text>", skills:[...]}
▼
ADAPTER injects as context (additionalContext / synthetic msg)
```
**Why Rust single binary:** matches rtk; `fastembed-rs` (ONNX) gives all-MiniLM
embeddings with no Python/venv and no running service. The same binary is the hook,
the indexer, and the installer. opencode's TS plugin just spawns it.
---
## 4. Components
### 4.1 Core binary `ski`
Subcommands:
- `ski init [-g] [opencode]` — install/configure host(s), download model, build index. (§9)
- `ski index [--rebuild]` — discover skills, embed descriptions, write index. Incremental by file hash. **Auto-runs on every SessionStart** (incremental, hash-gated → cheap) since skills drift over time.
- `ski hook --host <claude|opencode>` — read hook event JSON on **stdin**, write injection JSON on **stdout**. The hot path.
- `ski observe --host <claude|opencode>` — record skills the **model** loaded itself (§4.8) into session state, so we don't re-inject them.
- `ski session-start --host <claude|opencode>` — incremental reindex + re-arm session state (clear `loaded` on compaction).
- `ski why <prompt>` — debug: print ranked skills + scores (no injection). Tuning aid.
- `ski stats` — optional later: injection counts / hit rate (rtk-`gain` vibe).
### 4.2 Skill discovery
Scan roots, parse `SKILL.md` YAML frontmatter (`name`, `description`, optional
`keywords`/`aliases`). Roots:
- Claude: `~/.claude/skills/`, plugin skills `~/.claude/plugins/**/skills/`, project `.claude/skills/`.
- opencode: equivalent skill dirs + project.
- Honor a config `extra_roots`.
### 4.3 Embedding
`fastembed-rs` (ONNX via `ort`; optional CUDA exec-provider on GPU boxes).
**Default: `bge-small-en-v1.5`** (384-dim, stronger recall — sensible default given
GPU / ample RAM, e.g. the Strix Halo box). **Lite alt: `all-MiniLM-L6-v2-q`**
(`model = "..."` or `ski init --lite`) for low-RAM / CPU-only machines. Both 384-dim;
the index is **model-tagged**, so switching model forces a full reindex.
bge is **asymmetric**: prefix the *query* (prompt) with bge's retrieval instruction
(`"Represent this sentence for searching relevant passages: "`) but embed skill
descriptions **without** prefix. (MiniLM is symmetric — no prefix either side.)
Model downloaded on `init` → `~/.local/share/ski/models/`, then fully offline.
Optional `--features embed-model` `include_bytes!` build for a zero-download binary.
### 4.4 Index store
`~/.local/share/ski/index.json` (+ optional `vectors.bin` of packed f32):
```jsonc
{
"model": "all-MiniLM-L6-v2-q",
"dim": 384,
"skills": [
{
"id": "git-tools/git-attribution",
"name": "git-attribution",
"description": "Kernel-style AI commit attribution ...",
"path": "/home/b/.claude/plugins/.../SKILL.md",
"host": "claude",
"keywords": ["commit", "attribution", "assisted-by"],
"hash": "sha256:...", // SKILL.md content hash → cache invalidation
"embedding": [0.013, -0.21, ...] // 384 f32 of the *description*
}
]
}
```
Reindex only entries whose `hash` changed. (Index = the **big cache**; computed once.)
### 4.5 Decision pipeline (the hot path)
1. Parse stdin event → `{prompt, session_id, cwd, host}`.
2. Load index; if any skill-file mtime newer than index, incremental reindex.
3. `q = embed(prompt)`.
4. **Hybrid score** per skill:
`score = cosine(q, skill.embedding) + Σ keyword_boost(prompt, skill.keywords)`
Exact keyword/alias substring → fixed boost (or `force` flag → always inject).
5. Keep `score ≥ min_similarity`, sort desc, cap `max_skills`, respect `char_budget`.
6. **Session dedup:** drop skills already in this session's state — whether **we**
injected them or the **model loaded them itself** (§4.8). Cleared on compaction.
7. Read surviving `SKILL.md` bodies → build injection text per `inject_mode` & `directive_strength`.
8. Append injected ids to session state; write decision cache.
9. Emit injection JSON.
### 4.6 Session state
`~/.local/state/ski/sessions/<session_id>.json`:
```json
{ "loaded": { "git-tools/git-attribution": "ski", "python-dev/uv-setup": "model" },
"updated": "2026-06-14T12:00:00Z" }
```
Dedup skips **any** id in `loaded`, regardless of source. `ski` = we injected it;
`model` = the model loaded it on its own (§4.8) — we must not re-inject a decision
the model already made. **Re-arm on compaction** (Claude `SessionStart` matcher
`compact`; opencode `session.compacted`): clear `loaded` so skills can be
re-injected into the fresh summary.
### 4.8 Detect model-loaded skills
The model may pull a skill itself (Claude's native chooser; opencode `use_skill`).
Record those so we don't double-inject:
- **Claude:** `PostToolUse` on `Read`/`Skill` → if the path matches `**/SKILL.md`,
derive the skill id and mark `loaded[id] = "model"`.
- **opencode:** observe the skill-load / tool-exec event → same.
`ski observe` does this. Especially important in `directive` mode: once the model
acts on our directive and reads the skill, we stop re-prompting it.
### 4.7 Caches (two layers — keep them distinct)
- **Index cache** (skill-description embeddings): persistent, invalidated by file
hash. *Mandatory, big win.*
- **Decision cache** (`hash(prompt + index_version)` → skill ids, TTL ~1h):
*optional.* Locally the embed is already ~ms, so this mainly skips re-IO on
identical re-submits. Mirrors supercharged's intent cache but cheaper to justify.
- **Prompt cache (Anthropic):** we do **not** set `cache-control`. `additionalContext`
appends *after* the cached system/skills prefix, so injection is cache-safe by
construction. Re-injecting the same body every turn would bloat context → that is
exactly what §4.6 session dedup prevents. (opencode local models: no prompt cache;
same anti-bloat reasoning.)
---
## 5. Config
`~/.config/ski/config.toml` (project `.ski.toml` overrides):
```toml
model = "bge-small-en-v1.5" # default; "all-MiniLM-L6-v2-q" = lite (low-RAM/CPU). custom ONNX = later
min_similarity = 0.35 # cosine threshold — TUNE with `ski why`
max_skills = 2
char_budget = 6000 # cap total injected chars
inject_mode = "directive" # v1 default: tell model to load (keep agency). "body" = inject SKILL.md
directive_strength = "auto" # auto | soft | hard (see §6)
decision_cache_ttl = "1h"
extra_roots = []
deny = [] # skill ids never auto-injected
force = [] # skill ids always injected when keyword hit
```
---
## 6. Per-host / per-model intensity
| Claude Code | `directive` | `soft` ("A relevant skill exists below — load it if applicable.") | strong native chooser; a nudge is enough |
| opencode + local model | `directive` | `hard` ("You MUST load and follow the skill below for this task.") | weak chooser → explicit |
`directive_strength = "auto"` resolves from `--host` + a `local_model` config flag.
**v1 ships `directive` everywhere** (keep model agency, observe follow-through);
flip to `body` — globally or per-skill — once eval shows local models ignore the
directive. `body` is the eventual auto-inject target, not the v1 default.
---
## 7. Code sketches
### 7.1 Core: rank (Rust)
```rust
use fastembed::{TextEmbedding, InitOptions, EmbeddingModel};
fn cosine(a: &[f32], b: &[f32]) -> f32 {
let (mut dot, mut na, mut nb) = (0.0, 0.0, 0.0);
for i in 0..a.len() { dot += a[i]*b[i]; na += a[i]*a[i]; nb += b[i]*b[i]; }
dot / (na.sqrt() * nb.sqrt() + 1e-8)
}
fn rank(prompt: &str, index: &Index, cfg: &Config) -> Vec<Hit> {
let model = TextEmbedding::try_new(
InitOptions::new(cfg.embedding_model()) // bge-small-en-v1.5 (default) | MiniLM-q (lite)
.with_cache_dir(cfg.model_dir())
).unwrap();
// bge is asymmetric: prefix the query, NOT the skill descriptions (§4.3)
let q = &model.embed(vec![format!("{}{}", cfg.query_prefix(), prompt)], None).unwrap()[0];
let mut hits: Vec<Hit> = index.skills.iter().map(|s| {
let kw = s.keywords.iter()
.filter(|k| prompt.to_lowercase().contains(&k.to_lowercase()))
.count() as f32 * cfg.keyword_boost;
Hit { id: s.id.clone(), score: cosine(q, &s.embedding) + kw }
}).collect();
hits.retain(|h| h.score >= cfg.min_similarity || cfg.force.contains(&h.id));
hits.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap());
hits.truncate(cfg.max_skills);
hits
}
```
### 7.2 Core: hook IO (`ski hook`)
```jsonc
// stdin (normalized by adapter)
{ "host": "claude", "session_id": "abc", "cwd": "/repo", "prompt": "fix the commit attribution" }
// stdout
{ "skills": ["git-tools/git-attribution"],
"inject": "<skill name=\"git-attribution\">\n...SKILL.md body...\n</skill>" }
```
### 7.3 Claude adapter — `plugins/skill-inject/hooks/hooks.json`
```json
{
"hooks": {
"UserPromptSubmit": [
{ "hooks": [ { "type": "command",
"command": "ski hook --host claude" } ] }
],
"PostToolUse": [
{ "matcher": "Read|Skill",
"hooks": [ { "type": "command",
"command": "ski observe --host claude" } ] }
],
"SessionStart": [
{ "matcher": "startup|resume|compact",
"hooks": [ { "type": "command",
"command": "ski session-start --host claude" } ] }
]
}
}
```
`ski session-start` = incremental reindex + re-arm; `ski observe` records skills the
model loaded itself (§4.8). The `UserPromptSubmit` wrapper emits Claude's contract:
```json
{ "hookSpecificOutput": {
"hookEventName": "UserPromptSubmit",
"additionalContext": "<skill ...>...</skill>" } }
```
> Packaging: ship a tiny `scripts/ski-bootstrap.sh` (rtk-style) that resolves the
> `ski` binary (PATH → `~/.local/bin` → `cargo install` hint) so the plugin works
> even before a global `init`.
### 7.4 opencode adapter — TS plugin
```ts
import type { Plugin } from "@opencode-ai/plugin"
export const SkillInject: Plugin = async ({ $, client, directory }) => {
return {
// confirm exact event name against opencode 1.17.x (see Open Questions)
"chat.message": async ({ message }, _out) => {
if (message.role !== "user") return
const payload = JSON.stringify({
host: "opencode", session_id: message.sessionID,
cwd: directory, prompt: message.text,
})
const res = await $`ski hook --host opencode`.stdin(payload).text()
const { inject } = JSON.parse(res)
if (inject) {
// inject as synthetic context message (same plumbing as opencode-agent-skills)
await client.session.message.create({
sessionID: message.sessionID,
parts: [{ type: "text", text: inject, synthetic: true }],
})
}
},
}
}
```
> Reference `joshuadavidthomas/opencode-agent-skills` for the exact injection call;
> opencode's "inject AI-visible message" support is maturing (issue #17412).
> Also wire `session.start → ski session-start` (reindex / re-arm) and the
> skill-load event `→ ski observe` (§4.8) to mirror the Claude adapter.
---
## 8. Guardrails
- `min_similarity` threshold + `max_skills` cap + `char_budget` → no context flooding.
- **Relative margin gate** (`score_margin`): drop any skill more than `score_margin`
below the single best-scoring skill, measured against the global top **before**
session dedup. Suppresses the weak tail (noisy embedders inject only near-peers)
and makes re-submitting a handled prompt fall silent instead of scraping lower
matches once the strong ones are deduped.
- `deny` list; `force` list for must-haves on keyword hit.
- Hybrid (embedding **+** keyword) so exact tool/command names aren't missed by
embeddings alone.
- Session dedup → never re-inject a skill already loaded (by us **or the model**, §4.8)
within a session; the only re-arm is compaction.
- Fail-open: any error in `ski hook` → emit empty injection, never block the prompt.
---
## 9. Init (rtk-style)
`ski init -g`:
1. Ensure binary on PATH (or copy to `~/.local/bin`).
2. Download embedding model → `~/.local/share/ski/models/`.
3. Run `ski index` (and re-run on every SessionStart thereafter — §4.1).
4. **Claude:** merge `UserPromptSubmit` + `PostToolUse` (observe model loads) +
`SessionStart` (reindex & re-arm) hooks into `~/.claude/settings.json`
(idempotent, like `install-plugins.sh`).
`ski init -g opencode`:
1–3 same.
4. Add plugin entry to `~/.config/opencode/opencode.json` `plugin[]` + the
`@opencode-ai/plugin` dep; drop the TS adapter file.
Idempotent; `--auto-patch` for non-interactive. In **this repo**, also shipped as a
marketplace plugin (`plugins/skill-inject/`) whose `hooks.json` points at `ski`.
---
## 10. Milestones
1. ✅ **DONE — Core rank + golden tests.** Rust `ski` crate: skill discovery +
`SKILL.md` frontmatter parse + embedding index + hybrid rank (cosine + keyword),
exposed as `ski index` / `ski why`. `Embedder` trait with an offline
bag-of-words backend (default; zero network/model — so build/test/CI run
anywhere) and a `fastembed` backend (bge-small / MiniLM) behind a cargo feature.
Golden tests assert `prompt → skill` on the real repo skills (`"…credit Claude in
this commit"` → `git-attribution`, `"bootstrap a new python project with uv"` →
`uv-setup`, etc.); verified top-1 on all 56 installed skills. fmt + clippy
(`-D warnings`) clean; pre-commit hooks wired at repo root. `ski why` is the
tuning harness. (`fastembed` feature compiles but isn't built in the offline lane
— verify + tune `min_similarity` against bge before milestone 2 ships injection.)
2. ✅ **DONE — Hook path.** `ski hook --host <claude|opencode>`: stdin event →
load-or-build index → embed → hybrid rank → `select` (deny + threshold +
relative margin gate + force + session dedup + `max_skills` cap) →
`inject::build` (directive/body, host-aware
strength, `char_budget`) → host contract on stdout (Claude `additionalContext` /
opencode `{skills,inject}`). Per-session dedup ledger at
`$XDG_STATE_HOME/ski/sessions/<id>.json` (`loaded: {id: "ski"|"model"}`); a skill
is never re-injected in a session. Fails open everywhere (empty injection, exit 0).
Decision cache deferred to milestone 6 (low local payoff). fmt + clippy + tests
green. *(min_similarity + score_margin still tuned for bow — re-tune for bge.)*
3. **Claude adapter** — hooks.json + bootstrap + `additionalContext`. End-to-end on Claude Code.
4. **opencode adapter** — TS plugin spawning binary; confirm injection event/API.
5. **`ski init`** — global install for both hosts; package as marketplace plugin here.
6. **Polish** — `ski stats`, per-model intensity, compaction re-arm.
---
## 11. Decisions & open questions
**Resolved**
- **Name = `ski`** (product `skill-inject`).
- **Embedding model = `bge-small-en-v1.5`** default — stronger recall for GPU/RAM-rich boxes; `all-MiniLM-L6-v2-q` lite fallback for constrained machines (§4.3). Other local models / custom ONNX = later.
- **`inject_mode` default = `directive`** — keep model agency for v1; revisit `body` after measuring local-model follow-through (§6).
- **Model bundling = download-on-init** — cache to `~/.local/share/ski/models/`; offline `include_bytes!` build stays a feature flag.
- **Reindex on every SessionStart** — skills drift; incremental + hash-gated so it's cheap (§4.1).
- **Track model-loaded skills** — yes; never re-inject the model's own choices (§4.8).
**Open**
1. **Prompt context for embedding** — prompt-only (lean v1) vs prompt + last assistant turn (better recall, noisier).
2. **opencode hook event** — exact event name + injection API in opencode 1.17.x (verify vs `opencode-agent-skills`).
3. **Decision cache** — defer to milestone 6 (low local payoff).