Skip to main content

Crate kham_core

Crate kham_core 

Source
Expand description

§kham-core

Pure Rust Thai word segmentation engine. no_std compatible (requires alloc).

§Quick start

use kham_core::Tokenizer;

let tokenizer = Tokenizer::new();
let tokens = tokenizer.segment("กินข้าวกับปลา");
for token in &tokens {
    println!("{} ({:?})", token.text, token.kind);
}

Re-exports§

pub use error::KhamError;
pub use segmenter::Tokenizer;
pub use segmenter::TokenizerBuilder;
pub use token::Token;
pub use token::TokenKind;

Modules§

dict
Dictionary backed by a Double-Array Trie (DARTS).
error
Error types for kham-core.
freq
Word frequency table built from the Thai National Corpus (TNC).
normalizer
Thai text normalizer.
pre_tokenizer
Unicode script classifier and pre-tokenizer.
segmenter
DAG-based maximal matching segmenter (newmm algorithm).
tcc
Thai Character Cluster (TCC) boundary detection.
token
Token types returned by the segmenter.