Struct tantivy_analysis_contrib::commons::PathTokenizer

source ·
pub struct PathTokenizer {
    pub reverse: bool,
    pub skip: usize,
    pub delimiter: char,
    pub replacement: Option<char>,
}
Available on crate feature commons only.
Expand description

Path tokenizer. It will tokenize this :

/part1/part2/part3

into

/part1
/part1/part2
/part1/part2/part3

Enabling reverse will make this tokenizer to behave like Lucene’s except that tokens will not be ordered the same way. See ReversePathHierarchyTokenizer

§Warning

To construct a new PathTokenizer you should use the PathTokenizerBuilder or the Default implementation as From trait will probably be removed.

§Examples

Here is an example with reverse set to false and use \ as character separator. It will also skip the first token.

use tantivy::tokenizer::{TextAnalyzer, Token};
use tantivy_analysis_contrib::commons::{PathTokenizer, PathTokenizerBuilder};

let path_tokenizer = PathTokenizerBuilder::default()
   .skip(1_usize)
   .delimiter('\\')
   .build()?;

let mut tmp = TextAnalyzer::builder(path_tokenizer).build();
let mut token_stream = tmp.token_stream("c:\\a\\b\\c");

let token = token_stream.next().expect("A token should be present.");
assert_eq!(token.text, "\\a".to_string());

let token = token_stream.next().expect("A token should be present.");
assert_eq!(token.text, "\\a\\b".to_string());

let token = token_stream.next().expect("A token should be present.");
assert_eq!(token.text, "\\a\\b\\c".to_string());

assert_eq!(None, token_stream.next());

This second example shows what tokens are produced if reverse is set to true and what does replacement parameter.

use tantivy::tokenizer::{TextAnalyzer, Token};
use tantivy_analysis_contrib::commons::{PathTokenizer, PathTokenizerBuilder};

let path_tokenizer = PathTokenizerBuilder::default()
   .delimiter('\\')
   .replacement('/')
   .reverse(true)
   .build()?;

let mut tmp = TextAnalyzer::builder(path_tokenizer).build();
let mut token_stream = tmp.token_stream("c:\\a\\b\\c");

let token = token_stream.next().expect("A token should be present.");
assert_eq!(token.text, "c".to_string());

let token = token_stream.next().expect("A token should be present.");
assert_eq!(token.text, "b/c".to_string());

let token = token_stream.next().expect("A token should be present.");
assert_eq!(token.text, "a/b/c".to_string());

let token = token_stream.next().expect("A token should be present.");
assert_eq!(token.text, "c:/a/b/c".to_string());

assert_eq!(None, token_stream.next());

Fields§

§reverse: bool

Do the tokenization backward.

mail.google.com

into

com
google.com
mail.google.com
§skip: usize

Number of parts to skip.

§delimiter: char

Delimiter of path parts In the following exemple, delimiter is the / character :

/part1/part2/part3
§replacement: Option<char>

Character that replaces delimiter for generated parts. If None then the same char as delimiter will be used. For example, if delimiter is / and replacement is |

/part1/part2/part3

will generate

|part1
|part1|part2
|part1|part2|part3

Trait Implementations§

source§

impl Clone for PathTokenizer

source§

fn clone(&self) -> PathTokenizer

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for PathTokenizer

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl Default for PathTokenizer

source§

fn default() -> Self

Construct a PathTokenizer with no skip and / as delimiter and replacement.

source§

impl Tokenizer for PathTokenizer

§

type TokenStream<'a> = PathTokenStream<'a>

The token stream returned by this Tokenizer.
source§

fn token_stream<'a>(&'a mut self, text: &'a str) -> Self::TokenStream<'a>

Creates a token stream for a given str.
source§

impl Copy for PathTokenizer

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.