Expand description
Re-exports§
pub use error::KhamError;pub use segmenter::Tokenizer;pub use segmenter::TokenizerBuilder;pub use token::NamedEntityKind;pub use token::Token;pub use token::TokenKind;
Modules§
- abbrev
- Thai abbreviation expansion.
- date
- Thai date normalization.
- dict
- Dictionary backed by a Double-Array Trie (DARTS).
- error
- Error types for kham-core.
- freq
- Word frequency table built from the Thai National Corpus (TNC).
- fts
- Full-text search pipeline for Thai text.
- ne
- Named entity tagging via a gazetteer (word-list approach).
- ngram
- Character-level and token-level n-gram generation for Thai FTS.
- normalizer
- Thai text normalizer.
- number
- Thai number normalization.
- pos
- Part-of-speech tagging for Thai words.
- pre_
tokenizer - Unicode script classifier and pre-tokenizer.
- romanizer
- RTGS romanization of segmented Thai words.
- segmenter
- DAG-based maximal matching segmenter (newmm algorithm).
- sentence
- Thai sentence segmentation.
- stopwords
- Thai stopword filter.
- synonym
- Synonym expansion for Thai full-text search.
- tcc
- Thai Character Cluster (TCC) boundary detection.
- token
- Token types returned by the segmenter.