pub struct Candidate<'a> {
pub word: &'a str,
pub log_prior: i32,
pub log_likelihood: i32,
pub match_type: MatchType,
pub source: Source,
}Expand description
One scored candidate. Word lifetime is 'a so consumers can pass
borrowed references through the merge pipeline; the merger clones
only the survivors.
Equality intentionally compares word + source + match_type, not
the score — score is a sort key, not part of identity. (Two
producers that emit the same word with the same match_type and
source are duplicates regardless of how they scored it.)
Fields§
§word: &'a str§log_prior: i32Q4 · log(prior probability). Producers derive from corpus
frequency (Q4 · log(1 + freq) is the canonical baseline) plus
any user-level boost (L0 pin, recency, etc.). Non-negative by
convention but the schema accepts negative for explicit
down-weights.
log_likelihood: i32Q4 · log(match likelihood). Producers derive from the match
shape: per-type base (e.g. LIKELIHOOD_JP_JUKUGO_BASE),
proximity decay for prefix matches (Q4 · K · log(proximity)),
edit-cost penalty for fuzzy matches, demote/promote factors
(TC demote, full-match promote, …) folded in additively in log
space.
match_type: MatchType§source: Source