Skip to main content

tokenize

Function tokenize 

Source
pub fn tokenize(text: &str) -> Vec<String>
Expand description

Split text into lowercase identifier-like tokens for BM25 indexing.

Compound identifiers (camelCase, PascalCase, snake_case) are expanded into sub-tokens so that partial matches work. The original compound token is preserved for exact-match boosting.

§Examples

"getHTTPResponse from MyClass" → tokens include gethttpresponse, get, http, response, from, myclass, my, class.