Module wana_kana::tokenize

source ·
Available on crate feature tokenize only.
Expand description

Splits input into array of strings separated by opinionated TokenType.

tokenize_detailed returns an array containing { TokenType, String } instead of String

Example

use wana_kana::tokenize::*;
let empty: Vec<String> = vec![];
assert_eq!(tokenize(""), empty);
assert_eq!(tokenize("ふふフフ"), vec!["ふふ", "フフ"]);
assert_eq!(tokenize("感じ"), vec!["感", "じ"]);
assert_eq!(tokenize("私は悲しい"), vec!["私", "は", "悲", "しい"] );

Enums

The tokenizer assigns each token a TokenType.

Functions

Tokenizes the text. Splits input into array of strings separated by opinionated TokenType.
Tokenizes the text and returns the token for each type.
Tokenizes the text. Splits input into array of strings separated by opinionated TokenType.