Skip to main content

tokenize_as_ids

Function tokenize_as_ids 

Source
pub fn tokenize_as_ids(
    text: &str,
    dictionary: &TokenDictionary,
) -> Vec<QueryToken>
Expand description

Tokenizes text and returns QueryTokens directly, avoiding string allocation.

This is the primary tokenization function for query processing. Tokens are classified against the dictionary immediately:

  • Known tokens โ†’ QueryToken::Known(KnownToken)
  • Unknown tokens โ†’ QueryToken::Unknown
  • Stopwords โ†’ QueryToken::Stopword

ยงReturns

A vector of QueryTokens (no string allocation).