pub fn default_tokenize(text: &str) -> Vec<String>Expand description
Tokenize text into lowercase words, stripping punctuation.
Splits on whitespace, removes non-alphanumeric characters from token boundaries, and filters empty tokens.
§Examples
ⓘ
use scry_learn::text::tokenizer::default_tokenize;
let tokens = default_tokenize("Hello, World! It's a test.");
assert_eq!(tokens, vec!["hello", "world", "it's", "a", "test"]);