Struct charabia::TokenizerBuilder
source · pub struct TokenizerBuilder<'al, 'sw, A> { /* private fields */ }
Expand description
Structure to build a tokenizer with custom settings.
To use default settings, use directly the Tokenize
implementation on &str.
Example
use fst::Set;
use charabia::TokenizerBuilder;
// text to tokenize.
let orig = "The quick (\"brown\") fox can't jump 32.3 feet, right? Brr, it's 29.3°F!";
// create the builder.
let mut builder = TokenizerBuilder::new();
// create a set of stop words.
let stop_words = Set::from_iter(["the"].iter()).unwrap();
// configurate stop words.
builder.stop_words(&stop_words);
// build the tokenizer passing the text to tokenize.
let tokenizer = builder.build();
Implementations§
source§impl<'al, 'sw, A> TokenizerBuilder<'al, 'sw, A>
impl<'al, 'sw, A> TokenizerBuilder<'al, 'sw, A>
sourcepub fn new() -> TokenizerBuilder<'al, 'sw, A>
pub fn new() -> TokenizerBuilder<'al, 'sw, A>
Create a TokenizerBuilder
with default settings,
if you don’t plan to set stop_words, prefer use TokenizerBuilder::default
source§impl<'al, 'sw, A> TokenizerBuilder<'al, 'sw, A>
impl<'al, 'sw, A> TokenizerBuilder<'al, 'sw, A>
sourcepub fn stop_words(&mut self, stop_words: &'sw Set<A>) -> &mut Self
pub fn stop_words(&mut self, stop_words: &'sw Set<A>) -> &mut Self
Configure the words that will be classified as TokenKind::StopWord
.
Arguments
stop_words
- aSet
of the words to classify as stop words.
sourcepub fn create_char_map(&mut self, create_char_map: bool) -> &mut Self
pub fn create_char_map(&mut self, create_char_map: bool) -> &mut Self
Enable or disable the creation of char_map
.
Arguments
create_char_map
- abool
that indicates whether achar_map
should be created.