Skip to main content

tokenize

Function tokenize 

Source
pub fn tokenize(text: &str) -> Vec<String>
Expand description

Split text on runs of non-ASCII-alphanumeric bytes and lowercase each resulting term. Empty input or input made entirely of separators returns an empty Vec.

Tokens are String rather than &str because the posting-list owns its term strings (see super::posting_list::PostingList); returning owned strings keeps the call site shape consistent with how the index stores them and avoids a second allocation downstream.