tokenize

Function tokenize 

Source
pub fn tokenize(
    text: &str,
    algorithm: Algorithm,
    case_sensitive: bool,
) -> Result<Vec<Token>, Error>
Expand description

Tokenizes text to a Vec of Tokens.

§Parameters

  • text - text to tokenize.
  • algorithm - algorithm to use.
  • case_sensitive - lowercase all tokens or not. Only for non-CJK and non Southeast Asian algorithms.

§Returns

  • Vec<Token> if tokenizer for algorithm was found.
  • [Error] otherwise.

§Errors

  • Error::NoTokenizer - no tokenizer was found. No tokenizers are enabled by default, you need to explicitly enable the desired ones with cargo features.

§Example

use language_tokenizer::{tokenize, Algorithm};

let text = "that's someone who can rizz just like a skibidi! zoomer slang rocks, 67";
let tokens = tokenize(text, Algorithm::English, false).unwrap();

assert_eq!(tokens, vec!["that", "someon", "who", "can", "rizz", "just", "like", "a", "skibidi", "zoomer", "slang", "rock", "67"])