Skip to main content

tokenize

language_tokenizer

Function tokenize

pub fn tokenize(
    text: &str,
    algorithm: Algorithm,
    case_sensitive: bool,
) -> Result<Vec<Token>, Error>

Expand description

Tokenizes text to a Vec of Tokens.

§Parameters

text - text to tokenize.
algorithm - algorithm to use.
case_sensitive - lowercase all tokens or not. Only for non-CJK and non Southeast Asian algorithms.

§Returns

Vec<Token> if tokenizer for algorithm was found.
[Error] otherwise.

§Errors

Error::NoTokenizer - no tokenizer was found. No tokenizers are enabled by default, you need to explicitly enable the desired ones with cargo features.

§Example

use language_tokenizer::{tokenize, Algorithm};

let text = "that's someone who can rizz just like a skibidi! zoomer slang rocks, 67";
let tokens = tokenize(text, Algorithm::English, false).unwrap();

assert_eq!(tokens, vec!["that", "someon", "who", "can", "rizz", "just", "like", "a", "skibidi", "zoomer", "slang", "rock", "67"])