Skip to main content

default_tokenize

Function default_tokenize 

Source
pub fn default_tokenize(text: &str) -> Vec<String>
Expand description

Tokenize text into lowercase words, stripping punctuation.

Splits on whitespace, removes non-alphanumeric characters from token boundaries, and filters empty tokens.

§Examples

use scry_learn::text::tokenizer::default_tokenize;

let tokens = default_tokenize("Hello, World! It's a test.");
assert_eq!(tokens, vec!["hello", "world", "it's", "a", "test"]);