pub trait TokenSet: Send + Sync {
// Required methods
fn canonicalize(&self, token: &str) -> Option<&'static str>;
fn is_trigraph(&self, token: &str) -> bool;
// Provided method
fn correction_vocab(&self) -> &[&'static str] { ... }
}Expand description
Minimal interface the parser needs from the token set.
Implemented by CapcoTokenSet; injected at engine init.
Required Methods§
Sourcefn canonicalize(&self, token: &str) -> Option<&'static str>
fn canonicalize(&self, token: &str) -> Option<&'static str>
Returns the canonical token string if token is a known CVE value.
Sourcefn is_trigraph(&self, token: &str) -> bool
fn is_trigraph(&self, token: &str) -> bool
Returns true if token is a known country trigraph.
Provided Methods§
Sourcefn correction_vocab(&self) -> &[&'static str]
fn correction_vocab(&self) -> &[&'static str]
Returns the vocabulary slice used for fuzzy correction lookups.
This is the token vocabulary against which unknown tokens are compared
by the marque_core::fuzzy module. Must be sorted and deduplicated
(binary search is used for the “is already valid” check).
The returned slice is borrowed from the implementor, which allows
implementations to hold the vocabulary on self (e.g., in a Vec
built at construction time) rather than in a global static. Each
entry is &'static str because the fuzzy matcher returns canonical
tokens with 'static lifetime in FuzzyCorrection::token.
The default implementation returns an empty slice, disabling fuzzy
correction for external TokenSet implementors that do not override it.