pub struct RegexTokenizer { /* private fields */ }
Expand description
Tokenize the text by using a regex pattern to split.
Each match of the regex emits a distinct token, empty tokens will not be emitted. Anchors such
as \A
will match the text from the part where the last token was emitted or the beginning of
the complete text if no token was emitted yet.
Example: 'aaa' bbb 'ccc' 'ddd'
with the pattern '(?:\w*)'
will be tokenized as
followed:
Term | aaa | ccc | ddd |
---|---|---|---|
Position | 1 | 2 | 3 |
Offsets | 0,5 | 10,15 | 16,21 |
§Example
use tantivy::tokenizer::*;
let mut tokenizer = RegexTokenizer::new(r"'(?:\w*)'").unwrap();
let mut stream = tokenizer.token_stream("'aaa' bbb 'ccc' 'ddd'");
{
let token = stream.next().unwrap();
assert_eq!(token.text, "'aaa'");
assert_eq!(token.offset_from, 0);
assert_eq!(token.offset_to, 5);
}
{
let token = stream.next().unwrap();
assert_eq!(token.text, "'ccc'");
assert_eq!(token.offset_from, 10);
assert_eq!(token.offset_to, 15);
}
{
let token = stream.next().unwrap();
assert_eq!(token.text, "'ddd'");
assert_eq!(token.offset_from, 16);
assert_eq!(token.offset_to, 21);
}
assert!(stream.next().is_none());
Implementations§
Source§impl RegexTokenizer
impl RegexTokenizer
Sourcepub fn new(regex_pattern: &str) -> Result<RegexTokenizer>
pub fn new(regex_pattern: &str) -> Result<RegexTokenizer>
Creates a new RegexTokenizer.
Trait Implementations§
Source§impl Clone for RegexTokenizer
impl Clone for RegexTokenizer
Source§fn clone(&self) -> RegexTokenizer
fn clone(&self) -> RegexTokenizer
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source
. Read moreSource§impl Tokenizer for RegexTokenizer
impl Tokenizer for RegexTokenizer
Source§type TokenStream<'a> = RegexTokenStream<'a>
type TokenStream<'a> = RegexTokenStream<'a>
The token stream returned by this Tokenizer.
Source§fn token_stream<'a>(&'a mut self, text: &'a str) -> RegexTokenStream<'a>
fn token_stream<'a>(&'a mut self, text: &'a str) -> RegexTokenStream<'a>
Creates a token stream for a given
str
.Auto Trait Implementations§
impl Freeze for RegexTokenizer
impl RefUnwindSafe for RegexTokenizer
impl Send for RegexTokenizer
impl Sync for RegexTokenizer
impl Unpin for RegexTokenizer
impl UnwindSafe for RegexTokenizer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Downcast for Twhere
T: Any,
impl<T> Downcast for Twhere
T: Any,
Source§fn into_any(self: Box<T>) -> Box<dyn Any>
fn into_any(self: Box<T>) -> Box<dyn Any>
Converts
Box<dyn Trait>
(where Trait: Downcast
) to Box<dyn Any>
, which can then be
downcast
into Box<dyn ConcreteType>
where ConcreteType
implements Trait
.Source§fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
Converts
Rc<Trait>
(where Trait: Downcast
) to Rc<Any>
, which can then be further
downcast
into Rc<ConcreteType>
where ConcreteType
implements Trait
.Source§fn as_any(&self) -> &(dyn Any + 'static)
fn as_any(&self) -> &(dyn Any + 'static)
Converts
&Trait
(where Trait: Downcast
) to &Any
. This is needed since Rust cannot
generate &Any
’s vtable from &Trait
’s.Source§fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
Converts
&mut Trait
(where Trait: Downcast
) to &Any
. This is needed since Rust cannot
generate &mut Any
’s vtable from &mut Trait
’s.Source§impl<T> DowncastSend for T
impl<T> DowncastSend for T
Source§impl<T> DowncastSync for T
impl<T> DowncastSync for T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more