pub fn tokenize(
text: &str,
algorithm: Algorithm,
case_sensitive: bool,
) -> Result<Vec<Token>, Error>Expand description
Tokenizes text to a Vec of Tokens.
§Parameters
text- text to tokenize.algorithm- algorithm to use.case_sensitive- lowercase all tokens or not. Only for non-CJK and non Southeast Asian algorithms.
§Returns
Vec<Token>if tokenizer foralgorithmwas found.- [
Error] otherwise.
§Errors
Error::NoTokenizer- no tokenizer was found. No tokenizers are enabled by default, you need to explicitly enable the desired ones with cargo features.
§Example
use language_tokenizer::{tokenize, Algorithm};
let text = "that's someone who can rizz just like a skibidi! zoomer slang rocks, 67";
let tokens = tokenize(text, Algorithm::English, false).unwrap();
assert_eq!(tokens, vec!["that", "someon", "who", "can", "rizz", "just", "like", "a", "skibidi", "zoomer", "slang", "rock", "67"])