Skip to main content

Tokenizer

Trait Tokenizer 

Source
pub trait Tokenizer:
    Send
    + Sync
    + Debug {
    // Required method
    fn tokenize(&self, text: &str) -> Vec<String>;
}
Expand description

Decompose a string into BM25 search terms.

Implementors choose the tokenization rule (whitespace + lowercase, BPE, sentencepiece, etc.) — BM25Index will index and search using whatever tokens this returns.

Send + Sync so a BM25Index carrying a dyn Tokenizer can be shared across threads. Debug so the index’s Debug derive works without manual impl.

Required Methods§

Source

fn tokenize(&self, text: &str) -> Vec<String>

Decompose text into a sequence of indexable terms.

Dyn Compatibility§

This trait is dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety".

Implementors§