pub fn tokenize(text: &str) -> Vec<String>
Tokenise on Unicode word boundaries — anything that is not an alphanumeric scalar value (or _) splits the token. Lowercases each emitted token.
_