Expand description
Text tokenization utilities.
Zero-dependency tokenizer that splits text at whitespace and punctuation boundaries, normalizes to lowercase, and supports n-gram generation.
Functionsยง
- default_
tokenize - Tokenize text into lowercase words, stripping punctuation.
- ngrams
- Generate n-grams from a list of tokens.