Expand description
Re-exports§
pub use error::KhamError;pub use segmenter::Tokenizer;pub use segmenter::TokenizerBuilder;pub use token::Token;pub use token::TokenKind;
Modules§
- dict
- Dictionary backed by a Double-Array Trie (DARTS).
- error
- Error types for kham-core.
- freq
- Word frequency table built from the Thai National Corpus (TNC).
- fts
- Full-text search pipeline for Thai text.
- ngram
- Character-level and token-level n-gram generation for Thai FTS.
- normalizer
- Thai text normalizer.
- pre_
tokenizer - Unicode script classifier and pre-tokenizer.
- segmenter
- DAG-based maximal matching segmenter (newmm algorithm).
- stopwords
- Thai stopword filter.
- synonym
- Synonym expansion for Thai full-text search.
- tcc
- Thai Character Cluster (TCC) boundary detection.
- token
- Token types returned by the segmenter.