skill-inject 0.9.0

skill-inject: local semantic auto-injection of agent skills
Documentation

skill-inject (ski)

CI License: AGPL v3 Claude Code plugin opencode plugin Local · offline · CPU

Local, model-agnostic automatic skill injection for Claude Code and opencode.

A strong model often won't use a skill it should — on indirect prompts it hand-rolls the task instead of invoking the skill built for it. ski is a local, deterministic nudge that surfaces the right skill so the model actually reaches for it.

Why this exists

Skill systems dump every skill's description into the model's context and hope it picks the right one. That works until it doesn't:

  • The model skips skills it should use. On indirect prompts — "clean up this messy CSV", "match our brand" — a capable host often just does the task by hand instead of reaching for the skill built for it.
  • It gets worse with more skills, and worse with weaker models. Picking one skill out of a wall of descriptions is hard, and which one fires drifts with whatever model is driving the session.
  • Every description costs context, every turn — relevant or not.

ski does the picking for you. It embeds your prompt on CPU, ranks it against your skill descriptions, and injects the matching skill only when one actually fits — same result on any model, no API call, nothing leaving your machine. The model still decides what to do with the skill; ski just makes sure the right one is in the room. Skills the model finds on its own are tracked and never injected twice.

See it decide

ski scores every installed skill against your prompt and injects only the ones above a fixed cutoff (-2.50 below); a higher score is a stronger match. Real ski why output against a live library of 57 skills:

$ ski why "clean up this messy CSV"
  xlsx              -0.59   <- injected (clear winner)
  pre-commit-setup  -3.72   <- skipped

clean up this messy CSV never says spreadsheet or xlsx — the match is on meaning, not vocabulary, and it lands far ahead of every other skill. Keyword or description matching can't bridge that gap, and a model scanning 57 descriptions can easily miss it.

ski deliberately errs toward over-sending — a borderline skill is injected rather than withheld, since a strong host simply ignores a skill it doesn't need but can't use one it never saw, and when ski does inject it asks firmly ("invoke it now") rather than hedging.

$ ski why "what time is the meeting tomorrow"
  handoff          -3.08   <- best skill, still below the cutoff

An off-topic prompt leaves every skill under the cutoff, so ski injects nothing — no false positives, no context pollution.

This repo is a single Rust binary (ski) plus the thin host adapters that drive it, packaged as a one-plugin Claude Code marketplace. See DEVELOPING.md for the dev workflow.

Speed and examples

Fast and entirely local — no API call, no token cost, nothing leaves your machine. The whole pipeline (embed → retrieve → rerank) runs on CPU in about half a second per prompt:

operation (cold — every hook is a fresh process) time
rank + inject one prompt (ski hook) ~0.61 s median
full index rebuild (57 skills) ~0.73 s
incremental reindex, no change ~0.19 s

bge-small-en-v1.5 (384-dim) retrieval + jina-reranker-v1-turbo-en rerank, ~270 MB RAM. Measured CPU-only on an AMD Ryzen AI MAX+ 395 — cold runs with model load included, not warm microbenchmarks.

It matches on meaning, not keywords. Every row is a real ski why result against a live library of 57 skills; a higher match score is a stronger match, and anything below -2.50 is left out entirely (as in the off-topic example above):

your prompt skill ski injects match score
set up a python project with uv uv-setup 2.76
scaffold a new react typescript frontend react-ts-setup 3.38
how do I credit Claude in this git commit git-attribution 1.21
make an animated gif for slack slack-gif-creator 1.63
write a Word doc with a table of contents docx 0.12
extract tables from a pdf pdf 0.67

Reproduce any of it with ski index then ski why "<your prompt>".

Install

One command installs the prebuilt binary into ~/.local/bin (Linux x86_64) and wires every host it finds on disk — Claude Code and opencode:

curl -fsSL https://raw.githubusercontent.com/bcmyguest/skill-injector/main/scripts/install.sh | sh

It auto-detects hosts (~/.claude → Claude hooks in settings.json; ~/.config/opencode → the opencode plugin). Pin one with SKI_HOST=claude|opencode|both|none. The host wiring is additive and idempotent — re-running is safe, and any existing Claude settings.json is backed up to settings.json.bak first.

After wiring, the installer pre-downloads the embedder + reranker weights (~275 MB, once, to ~/.config/ski/models) and builds the skill index, so your first prompt doesn't block on the download (SKI_PREWARM=0 skips this). Every run after is fully offline. Installs that bypass the installer (marketplace plugin, cargo install) download the weights on first use instead — run ski index once after installing to front-load it. (The --no-default-features build skips all of this — it uses the bundled bag-of-words embedder and never touches the network.)

However you install, verify it with one command: ski doctor checks the whole chain — hook wiring, config file, skill discovery, index freshness, model cache, state dirs, then a live embed+rank smoke test — and prints a concrete fix for anything broken (exit 1 if a blocking problem is found). ski's hot path deliberately fails silent, never blocking a prompt, so ski doctor is the loud answer to "is it actually working?". Once it's been running a while, ski status answers the next question — is it helping? — from the per-session ledgers (no telemetry needed): the skills ski surfaced that the model then invoked, the ones it surfaced that went unused, and the ones the model found on its own while ski stayed silent.

.deb / .rpm packages are on the Releases page. To build from source instead (default build = real embedder + reranker, downloads the model once then runs offline), then wire the host yourself:

cargo install --path .            # -> ~/.cargo/bin/ski (real embedder + reranker)
# ...or the offline bag-of-words build, no deps or model download:
# cargo install --path . --no-default-features
ski init -g claude               # then wire the host (or: opencode)

Prefer the Claude marketplace? Skip the host-wiring step and enable the plugin instead:

/plugin marketplace add bcmyguest/skill-injector   # or a local path to this repo
/plugin install skill-inject@skill-inject

However you install on Claude, three hooks get wired (hooks/hooks.json runs them through scripts/ski-bootstrap.sh, which resolves ski from PATH, then ~/.local/bin, then ~/.cargo/bin):

hook matcher command
UserPromptSubmit ski hook --host claude (rank + inject)
PostToolUse Read|Skill ski observe --host claude (record model-loaded skills)
SessionStart startup|resume|compact ski session-start --host claude (reindex; re-arm on compact)

If ski isn't found, the bootstrap exits 0 with no output — a missing build never blocks a prompt. Set SKI_DEBUG=1 for an install hint on stderr.

For opencode specifics (skill roots, the plugin event map), see opencode/README.md.

Usage

ski init -g claude            # wire ski's hooks into ~/.claude/settings.json (or opencode)
ski index                     # build the index at $XDG_DATA_HOME/ski/index.json
ski doctor                    # health-check the install end to end: wiring, config,
                              #   skills, index, models, state, live embed+rank smoke
                              #   test — one line per check, a fix for anything broken
ski why "credit Claude in this commit" --top 5   # ranked skills + scores (tuning aid)

ski status                    # what ski actually did in your recent conversations:
                              #   skills it surfaced that the model then invoked (assists),
                              #   ones it surfaced the model ignored, and ones the model
                              #   loaded itself while ski stayed silent (recall misses).
                              #   Reads the per-session ledgers — no telemetry needed.
ski status --limit 20         # list more conversations (--all for every one on record)

# Telemetry readout (needs telemetry = true, or SKI_TELEMETRY=1, while hooks ran):
ski history                   # aggregate: recommended vs. actually-used, top false positives/misses
ski history --tail 20         # last 20 events interleaved: recommendations (prompt + per-candidate
                              #   confidence + used?) and self-loads (acted-on-rec vs. RECALL MISS + prompt)
ski history --tail 20 --session conv-abc   # ...filtered to one conversation
ski suggest                   # turn the log into actions: force/keyword suggestions for skills
                              #   the model keeps self-loading while ski stays silent, deny
                              #   suggestions for repeat never-used injections (read-only)
ski clear                     # re-arm injection (wipe per-session dedup); --telemetry also wipes the log

# Hook hot-path (stdin event -> injection JSON on stdout):
echo '{"session_id":"s1","cwd":".","prompt":"credit Claude in this commit"}' \
  | ski hook --host claude     # -> {"hookSpecificOutput":{...,"additionalContext":...}}
echo '{"session_id":"s1","cwd":".","prompt":"set up a python project"}' \
  | ski hook --host opencode   # -> {"skills":[...],"inject":"..."}

The dedup ledger lives at $XDG_STATE_HOME/ski/sessions/<session_id>.json. The index is per-host (Claude index.json, opencode index-opencode.json) so the two never clobber. Downloaded embedder/reranker models cache once at $XDG_CONFIG_HOME/ski/models (default ~/.config/ski/models) — never in the working directory. SKI_ROOTS (colon-separated) overrides the skill-discovery roots for both hosts.

Configuration

Everything works with no config. An optional ~/.config/ski/config.toml ($XDG_CONFIG_HOME/ski/config.toml) overrides the compiled defaults — every key is optional, and a missing or malformed file is ignored (fail open, never blocks a prompt). The most-reached-for key is deny, to silence a skill that keeps surfacing.

# Silence / force specific skills (by their `name`):
deny  = ["example-skill"]   # never auto-injected
force = []                  # injected whenever a keyword hits, even below threshold

max_skills  = 2             # max skills injected per prompt
char_budget = 6000          # max total injected characters

# Reranker gate — JINA cross-encoder logits, where ~0 is the relevant/irrelevant
# boundary. Raise toward 0 to inject less; lower to inject more.
rerank_min = -2.5

# Opt-in JSONL telemetry (recommend/use events) for `ski history`; off by default.
# Equivalent to setting the SKI_TELEMETRY env var.
telemetry = false

# Ambient workspace-ecosystem boost: manifests in/above cwd (uv.lock, Cargo.toml,
# package.json, ...) and code files named in the conversation (etl.py, main.rs)
# surface the matching skill from *your* library, whatever it is named. On by
# default; 0 disables.
# project_boost = 0.15

# Stage-1 cosine thresholds. Normally left to per-embedder calibration; pin to override.
# min_similarity = 0.30
# score_margin   = 0.15

# model              = "bge-small-en-v1.5"   # the default; alts: "all-MiniLM-L6-v2-q", "bge-base-en-v1.5"
# inject_mode        = "directive"           # or "body"
# directive_strength = "auto"                # auto | soft | hard
# roots              = ["/abs/path/to/skills"]  # discovery roots; not tilde-expanded,
#                                               # and the SKI_ROOTS env var still wins

Advanced ranking knobs are also accepted: keyword_boost, recall_floor, high_conf, clear_gap, rerank_top_k, rerank_margin (see src/config.rs for what each gates).

How it works

prompt ─▶ adapter (Claude hook / opencode plugin) ─▶ ski (Rust, one binary)
                                                        1. prefilter (skip control payloads)
                                                        2. load index (skill vectors)
                                                        3. embed(prompt) locally
                                                        4. retrieve: cosine top-K (bge)
                                                        5. rerank: cross-encoder (JINA turbo)
                                                        6. gate: threshold + margin + deny/force + slash self-rec
                                                        7. dedup vs per-session ledger
                                                        8. emit injection ─▶ adapter injects as context
  • Two-stage ranking. A bge-small bi-encoder retrieves a candidate set; a JINA-turbo cross-encoder reranks it. Cheap O(1) query + cached vectors first, expensive pairwise scoring only on the short list. (Why not reranker-only: it's O(N) per prompt and loses the cosine early-out.)
  • Prompt prefilter. Host-generated control payloads (<task-notification>, <system-reminder> blocks) aren't user requests, so they skip injection outright rather than embedding into noise matches. A /<name> slash invocation is an explicit skill choice, so the skill it names is never recommended back — that self-recommendation was the single largest false positive in ski history.
  • Workspace awareness. A document file named in the conversation (sales.xlsx) boosts its skill directly, and the project channel reads the workspace: manifests in or above the cwd (uv.lock, Cargo.toml, package.json, ...) and code files named in the prompt (etl.py) yield ecosystem terms that are matched dynamically against whatever skills you actually have installed — a uv repo surfaces your uv skill by any name. When one of these channels fires, the injected line says why ("matched because you are working in a uv project"), giving the model checkable grounds instead of a bare relevance claim.
  • Per-session dedup. A skill injected by ski or loaded by the model itself is recorded in a session ledger and never re-injected — until compaction re-arms it.
  • Fail-open everywhere. Bad stdin, a missing index, any IO error → no output, exit 0. A ranking problem never blocks your prompt.

Embedding backends

  • Default (fastembed): real embeddings via fastembed (ONNX). Retrieval with bge-small-en-v1.5 (the query gets bge's retrieval-instruction prefix; descriptions don't), reranking with JINA turbo. all-MiniLM-L6-v2-q is the low-RAM alternative. Models download once and cache at $XDG_CONFIG_HOME/ski/models (default ~/.config/ski/models).
  • --no-default-features (offline): deterministic hashed bag-of-words. No deps, no network, no model — surface-token matching plus the keyword boost. Used for tests and as the fallback when no recognized model is configured.

The index is tagged with the embedder id, so switching backends/models triggers a full reindex automatically.

Build, test, lint

cargo build --release                # default: real embedder + reranker
cargo build --no-default-features    # offline: bag-of-words, no model download
cargo test --no-default-features     # unit + golden tests (offline, network-free)

cargo fmt --all -- --check
cargo clippy --all-targets -- -D warnings

Golden tests run against the self-contained fixtures in tests/fixtures/skills/ — they depend on nothing outside this repo. fmt / clippy / test are also wired as pre-commit hooks (.pre-commit-config.yaml).

License

GNU AGPL-3.0-or-later. Copyright (c) 2026 ski contributors. If you run a modified version — including over a network — you must release your source under the same terms.

No-AI-training request (non-binding): the AGPL governs your legal rights, but the authors additionally ask that this project, in whole or in part, not be used as training, fine-tuning, or evaluation data for machine-learning or AI systems.