Skip to main content

Tokenizer

Trait Tokenizer 

Source
pub trait Tokenizer:
    Sized
    + Send
    + 'static {
    type Global: Send + 'static;

    // Required methods
    fn name() -> &'static CStr;
    fn new(global: &Self::Global, args: Vec<String>) -> Result<Self, Error>;
    fn tokenize<TKF>(
        &mut self,
        reason: TokenizeReason,
        text: &[u8],
        push_token: TKF,
    ) -> Result<(), Error>
       where TKF: FnMut(&[u8], Range<usize>, bool) -> Result<(), Error>;
}
Expand description

Tokenizer

Required Associated Types§

Source

type Global: Send + 'static

一个全局数据的类型

Required Methods§

Source

fn name() -> &'static CStr

提供一个 tokenizer 名称

Source

fn new(global: &Self::Global, args: Vec<String>) -> Result<Self, Error>

创建 Tokenizer 方法

在创建 Tokenizer 实例后,通过指定的全局数据访问这个实例

在 xCreate 中被调用,xCreate 的 azArg 参数转换成 Vec,并以此提供给 new方法使用

Source

fn tokenize<TKF>( &mut self, reason: TokenizeReason, text: &[u8], push_token: TKF, ) -> Result<(), Error>
where TKF: FnMut(&[u8], Range<usize>, bool) -> Result<(), Error>,

分词的具体实现

应该检查 text 对象,并且对每个 token 调用 push_token 这个回调方法

push_token 的参数有

  • &u8 - token
  • Range - token 在文本中位置
  • bool - 对应 FTS5_TOKEN_COLOCATED

Dyn Compatibility§

This trait is not dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety", so this trait is not object safe.

Implementors§