Crate inputx_scoring

Expand description

inputx-scoring — probability-native candidate scoring primitive.

§The schema

Every candidate carries two log-space scalars and a typed match classification:

    score(W | i) = log_prior(W) + log_likelihood(i | W)

This is the Bayesian decomposition P(W|i) ∝ P(i|W) · P(W) rendered in log space — addition replaces multiplication, comparison stays monotone, and the two factors can be assigned independently by producers (dictionaries, n-grams, fuzzy matchers, …) without coordinating a single global formula.

Both terms are i32 Q4 fixed-point. One log unit = Q4 (= 16) integer steps. The Q4 choice trades resolution for headroom: scores fit comfortably in i32 even for very rare or very common words while staying precise enough that ranking-relevant gaps (~0.0625 in log space) survive quantization.

§Public surface

Source — which engine produced the candidate
MatchType — how the typed input maps to the candidate
Candidate — a (word, log_prior, log_likelihood, match_type, source) tuple
score — log_prior + log_likelihood, the sort key
Q4 — log-to-integer scale (= 16)

Scoring policy lives in the consumer (IME engine cement); this crate provides only the schema and the additive sort key.

Structs§

Candidate: One scored candidate. Word lifetime is 'a so consumers can pass borrowed references through the merge pipeline; the merger clones only the survivors.

Enums§

MatchType: Classification of how the typed input i maps to a candidate W. Carried alongside the (log_prior, log_likelihood) pair so the downstream merger / probe / UI can render context without re-deriving the match shape.
Source: Engine that produced a candidate. Numeric representation is stable across versions so it can cross the FFI boundary as u8 without translation. Mirrors inputx_core::composite::merge::Source.

Constants§

Q4: Fixed-point scale for log-space scalars. Q4 = 16 means every integer step is 1/16 of a log unit (≈ 0.0625). At this resolution, i32 covers a dynamic range of ~ ±67 million log units — far more than any realistic candidate score needs.

Functions§

derive_log_likelihood: Q4 log-likelihood derived from a match-type classification.
log_prior_from_freq: Q4 log of natural priors derived from a corpus frequency. Returns Q4 · ln(1 + freq) rounded to the nearest integer. Zero-freq candidates map to 0; freq-1 to ~11; freq-1000 to ~110.
score: Bayesian sort key: log_prior + log_likelihood. The single source of truth for ranking. Higher = better.