Skip to main content

Config

Struct Config 

Source
pub struct Config {
Show 28 fields pub model: String, pub min_similarity: f32, pub score_margin: f32, pub max_skills: usize, pub char_budget: usize, pub keyword_boost: f32, pub phrase_boost: f32, pub roots: Vec<PathBuf>, pub inject_mode: InjectMode, pub directive_strength: Strength, pub deny: Vec<String>, pub force: Vec<String>, pub context_depth: usize, pub context_weight: f32, pub vague_lo: f32, pub vague_hi: f32, pub file_boost: f32, pub project_boost: f32, pub recall_floor: f32, pub high_conf: f32, pub clear_gap: f32, pub rerank_top_k: usize, pub rerank_min: f32, pub rerank_margin: f32, pub body_inject_min: f32, pub lexical_min: f32, pub lexical_margin: f32, pub telemetry: bool,
}

Fields§

§model: String

Embedding model id. Recognized by the fastembed backend; otherwise the offline bag-of-words backend is used regardless of this value.

§min_similarity: f32

Minimum hybrid score for a skill to be eligible for injection.

§score_margin: f32

Max gap below the single best-scoring skill a skill may fall and still be injected. Suppresses the weak tail: when the top match is strong, only near-peers ride along; when only weak matches exist (or the leader was already injected this session), nothing clears the gate. Tuned alongside min_similarity per embedder.

§max_skills: usize

Max skills injected per prompt.

§char_budget: usize

Max total injected characters (budget; enforced in the hook path).

§keyword_boost: f32

Added to a skill’s score per matching keyword.

§phrase_boost: f32

Added to a skill’s score per matched trigger phrase (see crate::rank::phrase_score). Higher than keyword_boost: a full multi-token phrase match is stronger, higher-precision evidence than a single keyword token.

§roots: Vec<PathBuf>

Filesystem roots scanned for SKILL.md files.

§inject_mode: InjectMode

How matched skills are injected.

§directive_strength: Strength

Forcefulness of directive-mode injections.

§deny: Vec<String>

Skill ids never auto-injected.

§force: Vec<String>

Skill ids injected whenever a keyword hits, even below min_similarity.

§context_depth: usize

How many recent prompts to retain as conversational context (0 = disabled).

§context_weight: f32

Max weight the context channel can add to a skill’s score. The effective weight scales from this (a fully vague prompt) down to 0 (a confident, specific prompt) — see crate::rank::context_weight. Cosine-space, tuned per embedder; 0.0 disables the blend.

§vague_lo: f32

Prompt best-self-cosine at/below which a prompt counts as fully vague (context applied at full context_weight).

§vague_hi: f32

Prompt best-self-cosine at/above which a prompt counts as confident (context suppressed entirely). Between vague_lo and this, context scales linearly.

§file_boost: f32

Score added to a skill when a file of its type is referenced in the prompt or recent context (e.g. a .xlsx boosts xlsx; see crate::context::file_ids). High-precision and not vagueness-gated — a named file is unambiguous. 0.0 disables the channel.

§project_boost: f32

Score added to a skill whose ecosystem matches the working directory’s project manifests or a code file referenced in the conversation (a uv.lock implies the uv/python terms, a named etl.py implies python, …; see crate::context::project_terms / crate::context::code_terms). Terms resolve dynamically against the installed library (crate::context::skills_for_terms), so the channel surfaces whatever uv/rust/go skill the user actually has, by any name. Unlike a named file, this is an ambient signal present on every turn, so it is the weakest channel and is gated on the skill’s own cosine sitting within crate::rank::PROJECT_GATE_SLACK of min_similarity in crate::rank::rank_all_ctx — it lifts a near-plausible ecosystem skill over the floor (deliberately recall-leaning: the model ignores a surfaced skill it doesn’t need, and per-session dedup caps the cost at one showing) but never rescues a clearly-irrelevant one. 0.0 disables the channel.

§recall_floor: f32

Stage-1 score below which a prompt is treated as having no relevant skill, so the (costly) reranker is skipped entirely.

§high_conf: f32

Stage-1 score above which the top match may be a confident lone winner.

§clear_gap: f32

Minimum stage-1 gap from the top match to the runner-up for the top to count as a lone winner (and thus skip reranking).

§rerank_top_k: usize

How many stage-1 candidates are handed to the reranker.

§rerank_min: f32

Minimum reranker logit for a skill to be injected.

§rerank_margin: f32

Max reranker-logit gap below the best reranked skill for a peer to ride along.

§body_inject_min: f32

Confidence ([0,1]) at/above which a lone near-certain match is escalated from a directive pointer to a full body inject — the SKILL.md is inlined directly so the model can’t skip the Skill-tool round-trip. Only fires in inject_mode = directive and only when exactly one skill is selected (two co-relevant peers mean we are less certain, so they stay directives). Set deliberately high: in practice this is reached only by a cross-encoder- confirmed match (the cosine→confidence map caps below it for bge), so a fluky stage-1 hit never triggers a body dump. Raise above 1.0 to disable.

§lexical_min: f32

Minimum absolute BM25 score for the top description match to be a lexical winner. <= 0 disables the channel entirely.

§lexical_margin: f32

Minimum BM25 gap from the top description match to the runner-up for the top to count as dominant (and thus inject directly, skipping the reranker). The margin is what keeps the fast-path high-precision: a cluster of near-equal descriptions abstains and defers to the reranker.

§telemetry: bool

Append opt-in JSONL telemetry events (see crate::telemetry). Off by default. Enabled by this field or a truthy SKI_TELEMETRY env var — either one turns it on, so the env var still works without a config file.

Implementations§

Source§

impl Config

Source

pub fn calibrate_to(&mut self, embedder: &dyn Embedder)

Adopt the active embedder’s score thresholds. Cosine distributions are a property of the embedding space, not user preference, so min_similarity and score_margin follow the embedder that actually ran (bge vs the offline bag-of-words fallback). Other fields are left untouched.

Examples found in repository?
examples/eval.rs (line 148)
90fn main() -> anyhow::Result<()> {
91    let args: Vec<String> = std::env::args().skip(1).collect();
92    let verbose = args.iter().any(|a| a == "-v" || a == "--verbose");
93    let path = args
94        .iter()
95        .find(|a| !a.starts_with('-'))
96        .cloned()
97        .unwrap_or_else(|| "tests/data/popular_skills_prompts.tsv".to_string());
98
99    let raw = std::fs::read_to_string(&path)?;
100    let cases = parse_cases(&raw);
101
102    let (mut cfg, file) = Config::load(Host::Claude);
103    // A/B affordance: override the phrase-channel boost (0.0 disables it) so the
104    // same corpus can be scored with and without the channel in one rebuild.
105    if let Ok(v) = std::env::var("SKI_PHRASE_BOOST") {
106        cfg.phrase_boost = v.parse().expect("SKI_PHRASE_BOOST must be a float");
107    }
108    // Context enrichment (Goal 3) is off by default; these env knobs activate and
109    // tune it for one run, mirroring SKI_PHRASE_BOOST, so the same corpus can be
110    // scored with and without conversational context.
111    if let Ok(v) = std::env::var("SKI_CONTEXT_DEPTH") {
112        cfg.context_depth = v.parse().expect("SKI_CONTEXT_DEPTH must be a usize");
113    }
114    if let Ok(v) = std::env::var("SKI_CONTEXT_WEIGHT") {
115        cfg.context_weight = v.parse().expect("SKI_CONTEXT_WEIGHT must be a float");
116    }
117    if let Ok(v) = std::env::var("SKI_VAGUE_LO") {
118        cfg.vague_lo = v.parse().expect("SKI_VAGUE_LO must be a float");
119    }
120    if let Ok(v) = std::env::var("SKI_VAGUE_HI") {
121        cfg.vague_hi = v.parse().expect("SKI_VAGUE_HI must be a float");
122    }
123    if let Ok(v) = std::env::var("SKI_FILE_BOOST") {
124        cfg.file_boost = v.parse().expect("SKI_FILE_BOOST must be a float");
125    }
126    if let Ok(v) = std::env::var("SKI_PROJECT_BOOST") {
127        cfg.project_boost = v.parse().expect("SKI_PROJECT_BOOST must be a float");
128    }
129    // Reranker-gate sweep knobs: tune the stage-2 abstention floor/margin for one
130    // run without editing config.toml (these are on the logit scale, untouched by
131    // `calibrate_to`).
132    if let Ok(v) = std::env::var("SKI_RERANK_MIN") {
133        cfg.rerank_min = v.parse().expect("SKI_RERANK_MIN must be a float");
134    }
135    if let Ok(v) = std::env::var("SKI_RERANK_MARGIN") {
136        cfg.rerank_margin = v.parse().expect("SKI_RERANK_MARGIN must be a float");
137    }
138    // Lexical fast-path (BM25 over description) sweep knobs: `lexical_min <= 0`
139    // disables it, so the same corpus can be scored with and without the channel.
140    if let Ok(v) = std::env::var("SKI_LEXICAL_MIN") {
141        cfg.lexical_min = v.parse().expect("SKI_LEXICAL_MIN must be a float");
142    }
143    if let Ok(v) = std::env::var("SKI_LEXICAL_MARGIN") {
144        cfg.lexical_margin = v.parse().expect("SKI_LEXICAL_MARGIN must be a float");
145    }
146    let skills = skill::discover(&cfg.roots)?;
147    let embedder = embed::build(&cfg.model)?;
148    cfg.calibrate_to(embedder.as_ref());
149    file.apply_cosine(&mut cfg);
150    let idx = index::build(&skills, embedder.as_ref(), None)?;
151    eprintln!(
152        "index: {} skills via {} | rerank_min {:.2} margin {:.2} | min_sim {:.2} | lexical_min {:.2} margin {:.2}",
153        idx.skills.len(),
154        idx.model,
155        cfg.rerank_min,
156        cfg.rerank_margin,
157        cfg.min_similarity,
158        cfg.lexical_min,
159        cfg.lexical_margin,
160    );
161
162    // Confusion counters. `borderline` rows are tallied separately (observe-only).
163    let (mut tp, mut fn_, mut fp, mut tn) = (0u32, 0u32, 0u32, 0u32);
164    let (mut n_pos, mut n_neg) = (0u32, 0u32);
165    let mut fp_rows: Vec<String> = Vec::new();
166    let mut fn_rows: Vec<String> = Vec::new();
167    // Stage-1 retrieval ceiling (pre-rerank), over positives only: recall@k is the
168    // fraction whose gold skill survives into the top-`rerank_top_k` candidates the
169    // reranker is fed (`rerank::rerank` takes exactly that many); top-1 is the
170    // fraction already ranked first by hybrid score. recall@k ~100% means retrieval
171    // is not the bottleneck and the problem is ranking within the retrieved set.
172    let (mut recall_at_k, mut stage1_top1) = (0u32, 0u32);
173    let mut recall_miss_rows: Vec<String> = Vec::new();
174
175    for c in &cases {
176        let query = embedder
177            .embed(std::slice::from_ref(&c.prompt), EmbedKind::Query)?
178            .remove(0);
179        let cvec = context::vector(embedder.as_ref(), &c.context, &cfg)?;
180        // File-type channel: scan this turn's prompt AND its prior context for named
181        // files (a `.xlsx` etc.), mapping each to its skill.
182        let file_text = format!("{} {}", c.context.join(" "), c.prompt);
183        let file_ids = context::file_ids(&file_text);
184        // Ambient project-type channel: the case's cwd (5th column) yields
185        // ecosystem terms (plus any code file named in the conversation), resolved
186        // against the installed index. Empty when the channel is off.
187        let project_ids = if cfg.project_boost > 0.0 {
188            let mut terms = context::project_terms(&c.cwd);
189            terms.extend(context::code_terms(&file_text));
190            context::skills_for_terms(&terms, &idx)
191                .into_keys()
192                .collect()
193        } else {
194            std::collections::BTreeSet::new()
195        };
196        let hits = rank::rank_all_ctx(
197            &query,
198            cvec.as_deref(),
199            &file_ids,
200            &project_ids,
201            &c.prompt,
202            &idx,
203            &cfg,
204        );
205        // The reranker reads text: enrich its query with the recent window when the
206        // prompt is vague (same gate that lets the context vector contribute).
207        let prompt_top = hits.iter().map(|h| h.cosine).fold(0.0_f32, f32::max);
208        let rerank_query = context::rerank_query(
209            &c.prompt,
210            prompt_top,
211            &c.context,
212            !file_ids.is_empty(),
213            &cfg,
214        );
215        let plan = pipeline::decide(&hits, &idx, &c.prompt, &rerank_query, &cfg);
216        let stage = match plan.stage {
217            Stage::Lexical => "lexical",
218            Stage::Rerank => "rerank",
219            Stage::Cosine => "stage1",
220        };
221        // Caller-side guardrails: the hook's `finalize` minus session dedup (the eval
222        // has no session) — drop denied skills, cap at `max_skills`.
223        let injected: Vec<Hit> = plan
224            .passed
225            .into_iter()
226            .filter(|h| !cfg.deny.contains(&h.id))
227            .take(cfg.max_skills)
228            .collect();
229        let ids: Vec<String> = injected.iter().map(|h| h.id.clone()).collect();
230        let is_neg = c.want == "(none)";
231        let observe_only = c.kind == "borderline";
232
233        if verbose {
234            let top: Vec<String> = hits
235                .iter()
236                .take(4)
237                .map(|h| format!("{}={:.3}", h.id, h.score))
238                .collect();
239            let inj: Vec<String> = injected
240                .iter()
241                .map(|h| {
242                    format!(
243                        "{}=L{:.2}/cos{:.3}+ctx{:.2}+file{:.2}+proj{:.2}+kw{:.2}+ph{:.2}",
244                        h.id, h.score, h.cosine, h.context, h.file, h.project, h.keyword, h.phrase
245                    )
246                })
247                .collect();
248            eprintln!(
249                "[{:<10}] {:<7} inject=[{}]  top: {}  :: {}",
250                c.kind,
251                stage,
252                inj.join(", "),
253                top.join(", "),
254                c.prompt,
255            );
256        }
257
258        if observe_only {
259            continue;
260        }
261        if is_neg {
262            n_neg += 1;
263            if injected.is_empty() {
264                tn += 1;
265            } else {
266                fp += 1;
267                fp_rows.push(format!(
268                    "  FP [{:<10}] inject=[{}] :: {}",
269                    c.kind,
270                    ids.join(", "),
271                    c.prompt
272                ));
273            }
274        } else {
275            n_pos += 1;
276            // Stage-1 ceiling: where does the gold skill land in the full hybrid
277            // ranking, before any rerank/threshold gating?
278            let rank = hits.iter().position(|h| h.id == c.want);
279            if rank == Some(0) {
280                stage1_top1 += 1;
281            }
282            if rank.is_some_and(|r| r < cfg.rerank_top_k) {
283                recall_at_k += 1;
284            } else {
285                recall_miss_rows.push(format!(
286                    "  R@k MISS [{:<10}] want={} stage-1 rank={} :: {}",
287                    c.kind,
288                    c.want,
289                    rank.map_or_else(|| "absent".to_string(), |r| r.to_string()),
290                    c.prompt
291                ));
292            }
293            if ids.iter().any(|id| id == &c.want) {
294                tp += 1;
295            } else {
296                fn_ += 1;
297                fn_rows.push(format!(
298                    "  FN [{:<10}] want={} got=[{}] :: {}",
299                    c.kind,
300                    c.want,
301                    ids.join(", "),
302                    c.prompt
303                ));
304            }
305        }
306    }
307
308    println!("\n=== eval: {} ===", path);
309    println!(
310        "positives {n_pos}: recall {tp}/{n_pos} ({:.0}%)   misses {fn_}",
311        pct(tp, n_pos)
312    );
313    println!(
314        "negatives {n_neg}: false-inject {fp}/{n_neg} ({:.0}%)   clean {tn}",
315        pct(fp, n_neg)
316    );
317    // Headline: recall recovered, net of discounted FP harm. Optimise this — not
318    // FP count — because a strong host filters false injects (see module docs).
319    let recall_rate = if n_pos == 0 {
320        0.0
321    } else {
322        tp as f32 / n_pos as f32
323    };
324    let fp_rate = if n_neg == 0 {
325        0.0
326    } else {
327        fp as f32 / n_neg as f32
328    };
329    println!(
330        "host-value {:.0}%  (= recall {:.0}% - {FP_HARM} * fp {:.0}%; FP discounted: a strong host ignores false injects)",
331        100.0 * (recall_rate - FP_HARM * fp_rate),
332        100.0 * recall_rate,
333        100.0 * fp_rate,
334    );
335    println!(
336        "stage-1 (pre-rerank, k={}): recall@k {recall_at_k}/{n_pos} ({:.0}%)   top-1 {stage1_top1}/{n_pos} ({:.0}%)",
337        cfg.rerank_top_k,
338        pct(recall_at_k, n_pos),
339        pct(stage1_top1, n_pos),
340    );
341    if !recall_miss_rows.is_empty() {
342        println!(
343            "--- stage-1 recall@k misses (gold below top-{}) ---",
344            cfg.rerank_top_k
345        );
346        recall_miss_rows.iter().for_each(|r| println!("{r}"));
347    }
348    if !fn_rows.is_empty() {
349        println!("--- recall misses ---");
350        fn_rows.iter().for_each(|r| println!("{r}"));
351    }
352    if !fp_rows.is_empty() {
353        println!("--- false injections ---");
354        fp_rows.iter().for_each(|r| println!("{r}"));
355    }
356    Ok(())
357}
Source

pub fn for_host(host: Host) -> Self

Config scoped to host: discovery roots (and, via crate::paths::index_path, the on-disk index) cover only that host’s skill library. Keeps an injected skill name resolvable in the host that receives it — a Claude-only id never injects into opencode and vice versa.

Source

pub fn load(host: Host) -> (Self, FileConfig)

Host-scoped config with the user file (FileConfig) overlaid, returned alongside the parsed file. The file is returned so a caller that calibrates can re-assert the cosine pins afterward: Config::calibrate_to overwrites min_similarity/score_margin from the embedder and would otherwise clobber a user-set value. Callers that never calibrate can ignore the FileConfig.

Examples found in repository?
examples/eval.rs (line 102)
90fn main() -> anyhow::Result<()> {
91    let args: Vec<String> = std::env::args().skip(1).collect();
92    let verbose = args.iter().any(|a| a == "-v" || a == "--verbose");
93    let path = args
94        .iter()
95        .find(|a| !a.starts_with('-'))
96        .cloned()
97        .unwrap_or_else(|| "tests/data/popular_skills_prompts.tsv".to_string());
98
99    let raw = std::fs::read_to_string(&path)?;
100    let cases = parse_cases(&raw);
101
102    let (mut cfg, file) = Config::load(Host::Claude);
103    // A/B affordance: override the phrase-channel boost (0.0 disables it) so the
104    // same corpus can be scored with and without the channel in one rebuild.
105    if let Ok(v) = std::env::var("SKI_PHRASE_BOOST") {
106        cfg.phrase_boost = v.parse().expect("SKI_PHRASE_BOOST must be a float");
107    }
108    // Context enrichment (Goal 3) is off by default; these env knobs activate and
109    // tune it for one run, mirroring SKI_PHRASE_BOOST, so the same corpus can be
110    // scored with and without conversational context.
111    if let Ok(v) = std::env::var("SKI_CONTEXT_DEPTH") {
112        cfg.context_depth = v.parse().expect("SKI_CONTEXT_DEPTH must be a usize");
113    }
114    if let Ok(v) = std::env::var("SKI_CONTEXT_WEIGHT") {
115        cfg.context_weight = v.parse().expect("SKI_CONTEXT_WEIGHT must be a float");
116    }
117    if let Ok(v) = std::env::var("SKI_VAGUE_LO") {
118        cfg.vague_lo = v.parse().expect("SKI_VAGUE_LO must be a float");
119    }
120    if let Ok(v) = std::env::var("SKI_VAGUE_HI") {
121        cfg.vague_hi = v.parse().expect("SKI_VAGUE_HI must be a float");
122    }
123    if let Ok(v) = std::env::var("SKI_FILE_BOOST") {
124        cfg.file_boost = v.parse().expect("SKI_FILE_BOOST must be a float");
125    }
126    if let Ok(v) = std::env::var("SKI_PROJECT_BOOST") {
127        cfg.project_boost = v.parse().expect("SKI_PROJECT_BOOST must be a float");
128    }
129    // Reranker-gate sweep knobs: tune the stage-2 abstention floor/margin for one
130    // run without editing config.toml (these are on the logit scale, untouched by
131    // `calibrate_to`).
132    if let Ok(v) = std::env::var("SKI_RERANK_MIN") {
133        cfg.rerank_min = v.parse().expect("SKI_RERANK_MIN must be a float");
134    }
135    if let Ok(v) = std::env::var("SKI_RERANK_MARGIN") {
136        cfg.rerank_margin = v.parse().expect("SKI_RERANK_MARGIN must be a float");
137    }
138    // Lexical fast-path (BM25 over description) sweep knobs: `lexical_min <= 0`
139    // disables it, so the same corpus can be scored with and without the channel.
140    if let Ok(v) = std::env::var("SKI_LEXICAL_MIN") {
141        cfg.lexical_min = v.parse().expect("SKI_LEXICAL_MIN must be a float");
142    }
143    if let Ok(v) = std::env::var("SKI_LEXICAL_MARGIN") {
144        cfg.lexical_margin = v.parse().expect("SKI_LEXICAL_MARGIN must be a float");
145    }
146    let skills = skill::discover(&cfg.roots)?;
147    let embedder = embed::build(&cfg.model)?;
148    cfg.calibrate_to(embedder.as_ref());
149    file.apply_cosine(&mut cfg);
150    let idx = index::build(&skills, embedder.as_ref(), None)?;
151    eprintln!(
152        "index: {} skills via {} | rerank_min {:.2} margin {:.2} | min_sim {:.2} | lexical_min {:.2} margin {:.2}",
153        idx.skills.len(),
154        idx.model,
155        cfg.rerank_min,
156        cfg.rerank_margin,
157        cfg.min_similarity,
158        cfg.lexical_min,
159        cfg.lexical_margin,
160    );
161
162    // Confusion counters. `borderline` rows are tallied separately (observe-only).
163    let (mut tp, mut fn_, mut fp, mut tn) = (0u32, 0u32, 0u32, 0u32);
164    let (mut n_pos, mut n_neg) = (0u32, 0u32);
165    let mut fp_rows: Vec<String> = Vec::new();
166    let mut fn_rows: Vec<String> = Vec::new();
167    // Stage-1 retrieval ceiling (pre-rerank), over positives only: recall@k is the
168    // fraction whose gold skill survives into the top-`rerank_top_k` candidates the
169    // reranker is fed (`rerank::rerank` takes exactly that many); top-1 is the
170    // fraction already ranked first by hybrid score. recall@k ~100% means retrieval
171    // is not the bottleneck and the problem is ranking within the retrieved set.
172    let (mut recall_at_k, mut stage1_top1) = (0u32, 0u32);
173    let mut recall_miss_rows: Vec<String> = Vec::new();
174
175    for c in &cases {
176        let query = embedder
177            .embed(std::slice::from_ref(&c.prompt), EmbedKind::Query)?
178            .remove(0);
179        let cvec = context::vector(embedder.as_ref(), &c.context, &cfg)?;
180        // File-type channel: scan this turn's prompt AND its prior context for named
181        // files (a `.xlsx` etc.), mapping each to its skill.
182        let file_text = format!("{} {}", c.context.join(" "), c.prompt);
183        let file_ids = context::file_ids(&file_text);
184        // Ambient project-type channel: the case's cwd (5th column) yields
185        // ecosystem terms (plus any code file named in the conversation), resolved
186        // against the installed index. Empty when the channel is off.
187        let project_ids = if cfg.project_boost > 0.0 {
188            let mut terms = context::project_terms(&c.cwd);
189            terms.extend(context::code_terms(&file_text));
190            context::skills_for_terms(&terms, &idx)
191                .into_keys()
192                .collect()
193        } else {
194            std::collections::BTreeSet::new()
195        };
196        let hits = rank::rank_all_ctx(
197            &query,
198            cvec.as_deref(),
199            &file_ids,
200            &project_ids,
201            &c.prompt,
202            &idx,
203            &cfg,
204        );
205        // The reranker reads text: enrich its query with the recent window when the
206        // prompt is vague (same gate that lets the context vector contribute).
207        let prompt_top = hits.iter().map(|h| h.cosine).fold(0.0_f32, f32::max);
208        let rerank_query = context::rerank_query(
209            &c.prompt,
210            prompt_top,
211            &c.context,
212            !file_ids.is_empty(),
213            &cfg,
214        );
215        let plan = pipeline::decide(&hits, &idx, &c.prompt, &rerank_query, &cfg);
216        let stage = match plan.stage {
217            Stage::Lexical => "lexical",
218            Stage::Rerank => "rerank",
219            Stage::Cosine => "stage1",
220        };
221        // Caller-side guardrails: the hook's `finalize` minus session dedup (the eval
222        // has no session) — drop denied skills, cap at `max_skills`.
223        let injected: Vec<Hit> = plan
224            .passed
225            .into_iter()
226            .filter(|h| !cfg.deny.contains(&h.id))
227            .take(cfg.max_skills)
228            .collect();
229        let ids: Vec<String> = injected.iter().map(|h| h.id.clone()).collect();
230        let is_neg = c.want == "(none)";
231        let observe_only = c.kind == "borderline";
232
233        if verbose {
234            let top: Vec<String> = hits
235                .iter()
236                .take(4)
237                .map(|h| format!("{}={:.3}", h.id, h.score))
238                .collect();
239            let inj: Vec<String> = injected
240                .iter()
241                .map(|h| {
242                    format!(
243                        "{}=L{:.2}/cos{:.3}+ctx{:.2}+file{:.2}+proj{:.2}+kw{:.2}+ph{:.2}",
244                        h.id, h.score, h.cosine, h.context, h.file, h.project, h.keyword, h.phrase
245                    )
246                })
247                .collect();
248            eprintln!(
249                "[{:<10}] {:<7} inject=[{}]  top: {}  :: {}",
250                c.kind,
251                stage,
252                inj.join(", "),
253                top.join(", "),
254                c.prompt,
255            );
256        }
257
258        if observe_only {
259            continue;
260        }
261        if is_neg {
262            n_neg += 1;
263            if injected.is_empty() {
264                tn += 1;
265            } else {
266                fp += 1;
267                fp_rows.push(format!(
268                    "  FP [{:<10}] inject=[{}] :: {}",
269                    c.kind,
270                    ids.join(", "),
271                    c.prompt
272                ));
273            }
274        } else {
275            n_pos += 1;
276            // Stage-1 ceiling: where does the gold skill land in the full hybrid
277            // ranking, before any rerank/threshold gating?
278            let rank = hits.iter().position(|h| h.id == c.want);
279            if rank == Some(0) {
280                stage1_top1 += 1;
281            }
282            if rank.is_some_and(|r| r < cfg.rerank_top_k) {
283                recall_at_k += 1;
284            } else {
285                recall_miss_rows.push(format!(
286                    "  R@k MISS [{:<10}] want={} stage-1 rank={} :: {}",
287                    c.kind,
288                    c.want,
289                    rank.map_or_else(|| "absent".to_string(), |r| r.to_string()),
290                    c.prompt
291                ));
292            }
293            if ids.iter().any(|id| id == &c.want) {
294                tp += 1;
295            } else {
296                fn_ += 1;
297                fn_rows.push(format!(
298                    "  FN [{:<10}] want={} got=[{}] :: {}",
299                    c.kind,
300                    c.want,
301                    ids.join(", "),
302                    c.prompt
303                ));
304            }
305        }
306    }
307
308    println!("\n=== eval: {} ===", path);
309    println!(
310        "positives {n_pos}: recall {tp}/{n_pos} ({:.0}%)   misses {fn_}",
311        pct(tp, n_pos)
312    );
313    println!(
314        "negatives {n_neg}: false-inject {fp}/{n_neg} ({:.0}%)   clean {tn}",
315        pct(fp, n_neg)
316    );
317    // Headline: recall recovered, net of discounted FP harm. Optimise this — not
318    // FP count — because a strong host filters false injects (see module docs).
319    let recall_rate = if n_pos == 0 {
320        0.0
321    } else {
322        tp as f32 / n_pos as f32
323    };
324    let fp_rate = if n_neg == 0 {
325        0.0
326    } else {
327        fp as f32 / n_neg as f32
328    };
329    println!(
330        "host-value {:.0}%  (= recall {:.0}% - {FP_HARM} * fp {:.0}%; FP discounted: a strong host ignores false injects)",
331        100.0 * (recall_rate - FP_HARM * fp_rate),
332        100.0 * recall_rate,
333        100.0 * fp_rate,
334    );
335    println!(
336        "stage-1 (pre-rerank, k={}): recall@k {recall_at_k}/{n_pos} ({:.0}%)   top-1 {stage1_top1}/{n_pos} ({:.0}%)",
337        cfg.rerank_top_k,
338        pct(recall_at_k, n_pos),
339        pct(stage1_top1, n_pos),
340    );
341    if !recall_miss_rows.is_empty() {
342        println!(
343            "--- stage-1 recall@k misses (gold below top-{}) ---",
344            cfg.rerank_top_k
345        );
346        recall_miss_rows.iter().for_each(|r| println!("{r}"));
347    }
348    if !fn_rows.is_empty() {
349        println!("--- recall misses ---");
350        fn_rows.iter().for_each(|r| println!("{r}"));
351    }
352    if !fp_rows.is_empty() {
353        println!("--- false injections ---");
354        fp_rows.iter().for_each(|r| println!("{r}"));
355    }
356    Ok(())
357}

Trait Implementations§

Source§

impl Clone for Config

Source§

fn clone(&self) -> Config

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Config

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for Config

Source§

fn default() -> Self

The Claude-scoped config. ski index/why (and the eval harness) default here; the hot paths build Config::for_host from their --host flag.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

Source§

impl<R, P> ReadPrimitive<R> for P
where R: Read + ReadEndian<P>, P: Default,

Source§

fn read_from_little_endian(read: &mut R) -> Result<Self, Error>

Read this value from the supplied reader. Same as ReadEndian::read_from_little_endian().
Source§

fn read_from_big_endian(read: &mut R) -> Result<Self, Error>

Read this value from the supplied reader. Same as ReadEndian::read_from_big_endian().
Source§

fn read_from_native_endian(read: &mut R) -> Result<Self, Error>

Read this value from the supplied reader. Same as ReadEndian::read_from_native_endian().
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more