Struct tantivy_analysis_contrib::commons::PathTokenizer
source · pub struct PathTokenizer {
pub reverse: bool,
pub skip: usize,
pub delimiter: char,
pub replacement: Option<char>,
}
commons
only.Expand description
Path tokenizer. It will tokenize this :
/part1/part2/part3
into
/part1
/part1/part2
/part1/part2/part3
Enabling reverse
will make this tokenizer to behave like Lucene’s except that tokens will not be ordered the same way. See
ReversePathHierarchyTokenizer
§Warning
To construct a new PathTokenizer you should use the PathTokenizerBuilder or the Default implementation as From trait will probably be removed.
§Examples
Here is an example with reverse
set to false
and use \
as character separator. It will also skip the first token.
use tantivy::tokenizer::{TextAnalyzer, Token};
use tantivy_analysis_contrib::commons::{PathTokenizer, PathTokenizerBuilder};
let path_tokenizer = PathTokenizerBuilder::default()
.skip(1_usize)
.delimiter('\\')
.build()?;
let mut tmp = TextAnalyzer::builder(path_tokenizer).build();
let mut token_stream = tmp.token_stream("c:\\a\\b\\c");
let token = token_stream.next().expect("A token should be present.");
assert_eq!(token.text, "\\a".to_string());
let token = token_stream.next().expect("A token should be present.");
assert_eq!(token.text, "\\a\\b".to_string());
let token = token_stream.next().expect("A token should be present.");
assert_eq!(token.text, "\\a\\b\\c".to_string());
assert_eq!(None, token_stream.next());
This second example shows what tokens are produced if reverse
is set to true
and what does replacement
parameter.
use tantivy::tokenizer::{TextAnalyzer, Token};
use tantivy_analysis_contrib::commons::{PathTokenizer, PathTokenizerBuilder};
let path_tokenizer = PathTokenizerBuilder::default()
.delimiter('\\')
.replacement('/')
.reverse(true)
.build()?;
let mut tmp = TextAnalyzer::builder(path_tokenizer).build();
let mut token_stream = tmp.token_stream("c:\\a\\b\\c");
let token = token_stream.next().expect("A token should be present.");
assert_eq!(token.text, "c".to_string());
let token = token_stream.next().expect("A token should be present.");
assert_eq!(token.text, "b/c".to_string());
let token = token_stream.next().expect("A token should be present.");
assert_eq!(token.text, "a/b/c".to_string());
let token = token_stream.next().expect("A token should be present.");
assert_eq!(token.text, "c:/a/b/c".to_string());
assert_eq!(None, token_stream.next());
Fields§
§reverse: bool
Do the tokenization backward.
mail.google.com
into
com
google.com
mail.google.com
skip: usize
Number of parts to skip.
delimiter: char
Delimiter of path parts
In the following exemple, delimiter is the /
character :
/part1/part2/part3
replacement: Option<char>
Character that replaces delimiter for generated parts.
If None then the same char as delimiter will be used.
For example, if delimiter is /
and replacement is |
/part1/part2/part3
will generate
|part1
|part1|part2
|part1|part2|part3
Trait Implementations§
source§impl Clone for PathTokenizer
impl Clone for PathTokenizer
source§fn clone(&self) -> PathTokenizer
fn clone(&self) -> PathTokenizer
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moresource§impl Debug for PathTokenizer
impl Debug for PathTokenizer
source§impl Default for PathTokenizer
impl Default for PathTokenizer
source§fn default() -> Self
fn default() -> Self
Construct a PathTokenizer with no skip and
/
as delimiter and replacement.
source§impl Tokenizer for PathTokenizer
impl Tokenizer for PathTokenizer
§type TokenStream<'a> = PathTokenStream<'a>
type TokenStream<'a> = PathTokenStream<'a>
source§fn token_stream<'a>(&'a mut self, text: &'a str) -> Self::TokenStream<'a>
fn token_stream<'a>(&'a mut self, text: &'a str) -> Self::TokenStream<'a>
str
.