Skip to main content

Module lexical

Module lexical 

Source
Expand description

Stage-1.5 lexical channel: BM25 over full skill descriptions.

The dense bi-encoder embeds the (short, curated) description and the tiny cross-encoder reranker both miss lexically-obvious indirect matches — a prompt like “turn this sales spreadsheet into a chart” whose gold skill’s description literally contains “spreadsheet”, “chart”, “formulas” but whose cosine sits in the muddy ~0.6 band and whose reranker logit falls below the abstention floor (so ski says nothing; the host’s own chooser misses it too). BM25 over the description prose ranks those #1 reliably where the embeddings do not, because the discriminating vocabulary is literally in the prose.

This is a high-precision fast-path, not another additive score term: a dominant BM25 winner (clears Config::lexical_min in absolute score AND beats the runner-up by Config::lexical_margin) is injected directly, skipping the reranker — mirroring the existing “confident lone winner skips rerank” gate (crate::rerank::confident_winner) but keyed on lexical certainty, which is reliable exactly where the bi-encoder cosine is not. A plain stage-1 score boost would not work: the reranker overwrites stage-1 score with its logit and the agreement gate in crate::rerank::passes then rejects the still-sub-floor cosine.

The fast-path only fires when stage-1 has no confident lone dense winner (the caller gates on crate::rerank::confident_winner); a strong dense match is never overridden. Off by default (Config::lexical_min <= 0).

Structs§

Lex
A skill’s BM25 score against the prompt.

Functions§

dominant
The dominant lexical winner, if one exists: the top-scoring skill, provided it clears cfg.lexical_min in absolute BM25 and beats the runner-up by at least cfg.lexical_margin. None when the channel is off (lexical_min <= 0), the prompt has no content tokens, or no skill stands clearly apart — the margin is what makes this high-precision, so a cluster of near-equal descriptions abstains and defers to the reranker.
scores
BM25(prompt, description) for every skill, sorted by descending score.