Struct tantivy::tokenizer::RegexTokenizer
source · pub struct RegexTokenizer { /* private fields */ }
Expand description
Tokenize the text by using a regex pattern to split.
Each match of the regex emits a distinct token, empty tokens will not be emitted. Anchors such
as \A
will match the text from the part where the last token was emitted or the beginning of
the complete text if no token was emitted yet.
Example: 'aaa' bbb 'ccc' 'ddd'
with the pattern '(?:\w*)'
will be tokenized as
followed:
Term | aaa | ccc | ddd |
---|---|---|---|
Position | 1 | 2 | 3 |
Offsets | 0,5 | 10,15 | 16,21 |
§Example
use tantivy::tokenizer::*;
let mut tokenizer = RegexTokenizer::new(r"'(?:\w*)'").unwrap();
let mut stream = tokenizer.token_stream("'aaa' bbb 'ccc' 'ddd'");
{
let token = stream.next().unwrap();
assert_eq!(token.text, "'aaa'");
assert_eq!(token.offset_from, 0);
assert_eq!(token.offset_to, 5);
}
{
let token = stream.next().unwrap();
assert_eq!(token.text, "'ccc'");
assert_eq!(token.offset_from, 10);
assert_eq!(token.offset_to, 15);
}
{
let token = stream.next().unwrap();
assert_eq!(token.text, "'ddd'");
assert_eq!(token.offset_from, 16);
assert_eq!(token.offset_to, 21);
}
assert!(stream.next().is_none());
Implementations§
source§impl RegexTokenizer
impl RegexTokenizer
sourcepub fn new(regex_pattern: &str) -> Result<RegexTokenizer>
pub fn new(regex_pattern: &str) -> Result<RegexTokenizer>
Creates a new RegexTokenizer.
Trait Implementations§
source§impl Clone for RegexTokenizer
impl Clone for RegexTokenizer
source§fn clone(&self) -> RegexTokenizer
fn clone(&self) -> RegexTokenizer
Returns a copy of the value. Read more
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source
. Read moresource§impl Tokenizer for RegexTokenizer
impl Tokenizer for RegexTokenizer
§type TokenStream<'a> = RegexTokenStream<'a>
type TokenStream<'a> = RegexTokenStream<'a>
The token stream returned by this Tokenizer.
source§fn token_stream<'a>(&'a mut self, text: &'a str) -> RegexTokenStream<'a>
fn token_stream<'a>(&'a mut self, text: &'a str) -> RegexTokenStream<'a>
Creates a token stream for a given
str
.Auto Trait Implementations§
impl Freeze for RegexTokenizer
impl RefUnwindSafe for RegexTokenizer
impl Send for RegexTokenizer
impl Sync for RegexTokenizer
impl Unpin for RegexTokenizer
impl UnwindSafe for RegexTokenizer
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
source§impl<T> Downcast for Twhere
T: Any,
impl<T> Downcast for Twhere
T: Any,
source§fn into_any(self: Box<T>) -> Box<dyn Any>
fn into_any(self: Box<T>) -> Box<dyn Any>
Convert
Box<dyn Trait>
(where Trait: Downcast
) to Box<dyn Any>
. Box<dyn Any>
can
then be further downcast
into Box<ConcreteType>
where ConcreteType
implements Trait
.source§fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
Convert
Rc<Trait>
(where Trait: Downcast
) to Rc<Any>
. Rc<Any>
can then be
further downcast
into Rc<ConcreteType>
where ConcreteType
implements Trait
.source§fn as_any(&self) -> &(dyn Any + 'static)
fn as_any(&self) -> &(dyn Any + 'static)
Convert
&Trait
(where Trait: Downcast
) to &Any
. This is needed since Rust cannot
generate &Any
’s vtable from &Trait
’s.source§fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
Convert
&mut Trait
(where Trait: Downcast
) to &Any
. This is needed since Rust cannot
generate &mut Any
’s vtable from &mut Trait
’s.