Expand description
Phase-D probabilistic Recognizer — the “decoder”.
This module implements the deep-scan half of the strict/deep-scan
recognizer split introduced in Phase 4 PR-2. When the engine is
configured for deep-scan (batch reconciliation mode,
rule-escalated region, --deep-scan CLI flag), and the strict
recognizer returns zero candidates for a marking region, the
engine falls back to the decoder to recover mangled markings that
are one of a small set of canonical-shape deviations away from a
real CAPCO-2016 marking:
- Edit-distance-1/2 token typos (
SERCET→SECRET). - Token reordering within categories (
NOFORN//SECRET→SECRET//NOFORN). - CAPCO-2016-superseded tokens (
COMINT→SI). - Case mistakes (
secret//noforn→SECRET//NOFORN). - Garbled delimiters (
S ∕∕ NOFORN→S//NOFORN).
The decoder never fabricates a marking where none exists. When the
observed tokens fit no CAPCO grammar template, it returns
Parsed::Ambiguous { candidates: vec![] } — the zero-candidate
signal per foundational-plan line 609-612.
§Why this lives in marque-engine, not marque-capco
Same Constitution VII rationale as StrictRecognizer (PR-2):
marque-capco may not depend on marque-core, but the decoder
needs core’s fuzzy-vocab matcher and strict parser to materialize
candidates. marque-engine is the sole crate where both chains
converge. The original tasks.md T059/T061 placement is amended in
tasks.md itself.
§Scoring approach (foundational-plan §5.2)
For each candidate the decoder computes:
log_posterior(candidate | observed)
= log_prior(candidate) // baked corpus priors (PR-1)
+ Σ log_likelihood(feature | candidate) // enumerated scored featuresThe decoder currently scores the candidate-shape features it
records from the closed FeatureId enum:
EditDistance1, EditDistance2, TokenReorder,
SupersededToken, and BaseRateCommonMarking. Each contributes
a fixed log-odds delta documented at the feature’s call site.
FeatureId::StrictContextClassification is part of the audit-
schema enum but is not currently a scored-feature term:
classification-level context is enforced through the separate
ParseContext::classification_floor hard filter (FR-011),
which rejects below-floor candidates before scoring rather than
adding a likelihood term to the posterior. FeatureId::CorpusOverrideInEffect
is reserved for PR-5 when corpus-override is wired; the decoder
does not emit it today. Turning either into an actual scored
contributor requires a coordinated audit-schema bump
(MARQUE_AUDIT_SCHEMA) per marque-rules/src/confidence.rs doc.
The top candidate wins when its posterior exceeds the runner-up by
a configured ratio; below that threshold the decoder returns
Parsed::Ambiguous { candidates } so the engine can surface a
diagnostic rather than auto-apply. Candidate::prior_log_odds
carries the prior alone (sum of token log-priors); the
per-feature log-odds deltas live only in
Candidate::evidence[i].log_odds, so a resolver that reconstructs
prior_log_odds + Σ evidence.log_odds recovers the decoder’s
internal posterior exactly, without double-counting.
§What this module is NOT
- Not a full template-matching grammar engine. The MVP materializes candidates by canonicalizing observed tokens and round-tripping through the strict parser — the strict parser is the arbiter of “is this a CAPCO-shape marking.” If the canonicalized bytes strict-parse, we have a candidate; if not, we discard.
- Not a learning system. All priors are compile-time-baked
&'statictables frommarque_capco::priors(Constitution III: no runtime corpus override on WASM). - Not a fix applier. The decoder proposes
CapcoMarkingcandidates; the engine applies them through the normalDiagnostic/FixProposalpath withFixSource::DecoderPosterior.
Structs§
- Decoder
Recognizer - Phase-D probabilistic marking recognizer.
- Strict
OrDecoder Recognizer - Recognizer that runs the strict path first and falls back to the decoder when the strict parse yields no meaningful attributes.