Skip to main content

FtsTokenizer

Struct FtsTokenizer 

Source
pub struct FtsTokenizer { /* private fields */ }
Expand description

Full-text search tokenizer for Thai text.

Wraps Tokenizer with stopword filtering, synonym expansion, and n-gram generation for out-of-vocabulary tokens.

Construct once and reuse:

use kham_core::fts::FtsTokenizer;

let fts = FtsTokenizer::new();
let tokens = fts.segment_for_fts("กินข้าวกับปลา");
assert!(!tokens.is_empty());

Implementations§

Source§

impl FtsTokenizer

Source

pub fn new() -> Self

Create an FtsTokenizer with built-in stopwords and no synonyms.

Source

pub fn builder() -> FtsTokenizerBuilder

Return a FtsTokenizerBuilder for custom configuration.

Source

pub fn segment_for_fts(&self, text: &str) -> Vec<FtsToken>

Segment text and annotate each token for FTS indexing.

Normalises the input text before segmentation so that สระลอย and stacked tone marks are handled correctly. Whitespace tokens are excluded.

The returned Vec<FtsToken> covers all non-whitespace tokens. Call index_tokens instead when you only need the tokens to be indexed (stopwords excluded).

Source

pub fn index_tokens(&self, text: &str) -> Vec<FtsToken>

Return only the tokens to be written into a search index.

Filters out stopwords and whitespace. Each FtsToken still carries its original position so phrase-distance scoring remains correct.

Source

pub fn lexemes(&self, text: &str) -> Vec<String>

Collect all lexeme strings to be stored in a tsvector.

Returns one string per non-stop token, plus synonym expansions and trigrams for unknown tokens. Duplicates are not removed (the caller or PostgreSQL handles deduplication).

Trait Implementations§

Source§

impl Default for FtsTokenizer

Source§

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.