Struct tantivy::tokenizer::RegexTokenizer

source ·
pub struct RegexTokenizer { /* private fields */ }
Expand description

Tokenize the text by using a regex pattern to split. Each match of the regex emits a distinct token, empty tokens will not be emitted. Anchors such as \A will match the text from the part where the last token was emitted or the beginning of the complete text if no token was emitted yet.

Example: 'aaa' bbb 'ccc' 'ddd' with the pattern '(?:\w*)' will be tokenized as followed:

Termaaacccddd
Position123
Offsets0,510,1516,21

§Example

use tantivy::tokenizer::*;

let mut tokenizer = RegexTokenizer::new(r"'(?:\w*)'").unwrap();
let mut stream = tokenizer.token_stream("'aaa' bbb 'ccc' 'ddd'");
{
    let token = stream.next().unwrap();
    assert_eq!(token.text, "'aaa'");
    assert_eq!(token.offset_from, 0);
    assert_eq!(token.offset_to, 5);
}
{
  let token = stream.next().unwrap();
    assert_eq!(token.text, "'ccc'");
    assert_eq!(token.offset_from, 10);
    assert_eq!(token.offset_to, 15);
}
{
  let token = stream.next().unwrap();
    assert_eq!(token.text, "'ddd'");
    assert_eq!(token.offset_from, 16);
    assert_eq!(token.offset_to, 21);
}
assert!(stream.next().is_none());

Implementations§

source§

impl RegexTokenizer

source

pub fn new(regex_pattern: &str) -> Result<RegexTokenizer>

Creates a new RegexTokenizer.

Trait Implementations§

source§

impl Clone for RegexTokenizer

source§

fn clone(&self) -> RegexTokenizer

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Tokenizer for RegexTokenizer

§

type TokenStream<'a> = RegexTokenStream<'a>

The token stream returned by this Tokenizer.
source§

fn token_stream<'a>(&'a mut self, text: &'a str) -> RegexTokenStream<'a>

Creates a token stream for a given str.

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> Downcast for T
where T: Any,

source§

fn into_any(self: Box<T>) -> Box<dyn Any>

Convert Box<dyn Trait> (where Trait: Downcast) to Box<dyn Any>. Box<dyn Any> can then be further downcast into Box<ConcreteType> where ConcreteType implements Trait.
source§

fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>

Convert Rc<Trait> (where Trait: Downcast) to Rc<Any>. Rc<Any> can then be further downcast into Rc<ConcreteType> where ConcreteType implements Trait.
source§

fn as_any(&self) -> &(dyn Any + 'static)

Convert &Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot generate &Any’s vtable from &Trait’s.
source§

fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)

Convert &mut Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot generate &mut Any’s vtable from &mut Trait’s.
source§

impl<T> DowncastSync for T
where T: Any + Send + Sync,

source§

fn into_any_arc(self: Arc<T>) -> Arc<dyn Any + Sync + Send>

Convert Arc<Trait> (where Trait: Downcast) to Arc<Any>. Arc<Any> can then be further downcast into Arc<ConcreteType> where ConcreteType implements Trait.
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> Pointable for T

source§

const ALIGN: usize = _

The alignment of pointer.
§

type Init = T

The type for initializers.
source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
source§

impl<T> Fruit for T
where T: Send + Downcast,