pub struct FtsTokenizer { /* private fields */ }Expand description
Full-text search tokenizer for Thai text.
Wraps Tokenizer with stopword filtering, synonym expansion, and n-gram
generation for out-of-vocabulary tokens.
Construct once and reuse:
use kham_core::fts::FtsTokenizer;
let fts = FtsTokenizer::new();
let tokens = fts.segment_for_fts("กินข้าวกับปลา");
assert!(!tokens.is_empty());Implementations§
Source§impl FtsTokenizer
impl FtsTokenizer
Sourcepub fn new() -> Self
pub fn new() -> Self
Create an FtsTokenizer with built-in stopwords and no synonyms.
Sourcepub fn builder() -> FtsTokenizerBuilder
pub fn builder() -> FtsTokenizerBuilder
Return a FtsTokenizerBuilder for custom configuration.
Sourcepub fn segment_for_fts(&self, text: &str) -> Vec<FtsToken>
pub fn segment_for_fts(&self, text: &str) -> Vec<FtsToken>
Segment text and annotate each token for FTS indexing.
Normalises the input text before segmentation so that สระลอย and stacked tone marks are handled correctly. Whitespace tokens are excluded.
The returned Vec<FtsToken> covers all non-whitespace tokens. Call
index_tokens instead when you only need the tokens to be indexed
(stopwords excluded).
Trait Implementations§
Auto Trait Implementations§
impl Freeze for FtsTokenizer
impl RefUnwindSafe for FtsTokenizer
impl Send for FtsTokenizer
impl Sync for FtsTokenizer
impl Unpin for FtsTokenizer
impl UnsafeUnpin for FtsTokenizer
impl UnwindSafe for FtsTokenizer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more