Skip to main content

TokenSet

Trait TokenSet 

Source
pub trait TokenSet: Send + Sync {
    // Required methods
    fn canonicalize(&self, token: &str) -> Option<&'static str>;
    fn is_trigraph(&self, token: &str) -> bool;

    // Provided method
    fn correction_vocab(&self) -> &[&'static str] { ... }
}
Expand description

Minimal interface the parser needs from the token set. Implemented by CapcoTokenSet; injected at engine init.

Required Methods§

Source

fn canonicalize(&self, token: &str) -> Option<&'static str>

Returns the canonical token string if token is a known CVE value.

Source

fn is_trigraph(&self, token: &str) -> bool

Returns true if token is a known country trigraph.

Provided Methods§

Source

fn correction_vocab(&self) -> &[&'static str]

Returns the vocabulary slice used for fuzzy correction lookups.

This is the token vocabulary against which unknown tokens are compared by the marque_core::fuzzy module. Must be sorted and deduplicated (binary search is used for the “is already valid” check).

The returned slice is borrowed from the implementor, which allows implementations to hold the vocabulary on self (e.g., in a Vec built at construction time) rather than in a global static. Each entry is &'static str because the fuzzy matcher returns canonical tokens with 'static lifetime in FuzzyCorrection::token.

The default implementation returns an empty slice, disabling fuzzy correction for external TokenSet implementors that do not override it.

Implementors§