Struct tantivy::tokenizer::NgramTokenizer
source · pub struct NgramTokenizer { /* private fields */ }
Expand description
Tokenize the text by splitting words into n-grams of the given size(s)
With this tokenizer, the position
is always 0.
Beware however, in presence of multiple value for the same field,
the position will be POSITION_GAP * index of value
.
Example 1: hello
would be tokenized as (min_gram: 2, max_gram: 3, prefix_only: false)
Term | he | hel | el | ell | ll | llo | lo |
---|---|---|---|---|---|---|---|
Position | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Offsets | 0,2 | 0,3 | 1,3 | 1,4 | 2,4 | 2,5 | 3,5 |
Example 2: hello
would be tokenized as (min_gram: 2, max_gram: 5, prefix_only: true)
Term | he | hel | hell | hello |
---|---|---|---|---|
Position | 0 | 0 | 0 | 0 |
Offsets | 0,2 | 0,3 | 0,4 | 0,5 |
Example 3: hεllo
(non-ascii) would be tokenized as (min_gram: 2, max_gram: 5, prefix_only:
true)
Term | hε | hεl | hεll | hεllo |
---|---|---|---|---|
Position | 0 | 0 | 0 | 0 |
Offsets | 0,3 | 0,4 | 0,5 | 0,6 |
§Example
use tantivy::tokenizer::*;
let mut tokenizer = NgramTokenizer::new(2, 3, false).unwrap();
let mut stream = tokenizer.token_stream("hello");
{
let token = stream.next().unwrap();
assert_eq!(token.text, "he");
assert_eq!(token.offset_from, 0);
assert_eq!(token.offset_to, 2);
}
{
let token = stream.next().unwrap();
assert_eq!(token.text, "hel");
assert_eq!(token.offset_from, 0);
assert_eq!(token.offset_to, 3);
}
{
let token = stream.next().unwrap();
assert_eq!(token.text, "el");
assert_eq!(token.offset_from, 1);
assert_eq!(token.offset_to, 3);
}
{
let token = stream.next().unwrap();
assert_eq!(token.text, "ell");
assert_eq!(token.offset_from, 1);
assert_eq!(token.offset_to, 4);
}
{
let token = stream.next().unwrap();
assert_eq!(token.text, "ll");
assert_eq!(token.offset_from, 2);
assert_eq!(token.offset_to, 4);
}
{
let token = stream.next().unwrap();
assert_eq!(token.text, "llo");
assert_eq!(token.offset_from, 2);
assert_eq!(token.offset_to, 5);
}
{
let token = stream.next().unwrap();
assert_eq!(token.text, "lo");
assert_eq!(token.offset_from, 3);
assert_eq!(token.offset_to, 5);
}
assert!(stream.next().is_none());
Implementations§
source§impl NgramTokenizer
impl NgramTokenizer
sourcepub fn new(
min_gram: usize,
max_gram: usize,
prefix_only: bool
) -> Result<NgramTokenizer>
pub fn new( min_gram: usize, max_gram: usize, prefix_only: bool ) -> Result<NgramTokenizer>
Configures a new Ngram tokenizer
sourcepub fn all_ngrams(min_gram: usize, max_gram: usize) -> Result<NgramTokenizer>
pub fn all_ngrams(min_gram: usize, max_gram: usize) -> Result<NgramTokenizer>
Create a NGramTokenizer
which generates tokens for all inner ngrams.
This is as opposed to only prefix ngrams .
sourcepub fn prefix_only(min_gram: usize, max_gram: usize) -> Result<NgramTokenizer>
pub fn prefix_only(min_gram: usize, max_gram: usize) -> Result<NgramTokenizer>
Create a NGramTokenizer
which only generates tokens for the
prefix ngrams.
Trait Implementations§
source§impl Clone for NgramTokenizer
impl Clone for NgramTokenizer
source§fn clone(&self) -> NgramTokenizer
fn clone(&self) -> NgramTokenizer
Returns a copy of the value. Read more
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source
. Read moresource§impl Debug for NgramTokenizer
impl Debug for NgramTokenizer
source§impl Tokenizer for NgramTokenizer
impl Tokenizer for NgramTokenizer
§type TokenStream<'a> = NgramTokenStream<'a>
type TokenStream<'a> = NgramTokenStream<'a>
The token stream returned by this Tokenizer.
source§fn token_stream<'a>(&'a mut self, text: &'a str) -> NgramTokenStream<'a>
fn token_stream<'a>(&'a mut self, text: &'a str) -> NgramTokenStream<'a>
Creates a token stream for a given
str
.Auto Trait Implementations§
impl Freeze for NgramTokenizer
impl RefUnwindSafe for NgramTokenizer
impl Send for NgramTokenizer
impl Sync for NgramTokenizer
impl Unpin for NgramTokenizer
impl UnwindSafe for NgramTokenizer
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
source§impl<T> Downcast for Twhere
T: Any,
impl<T> Downcast for Twhere
T: Any,
source§fn into_any(self: Box<T>) -> Box<dyn Any>
fn into_any(self: Box<T>) -> Box<dyn Any>
Convert
Box<dyn Trait>
(where Trait: Downcast
) to Box<dyn Any>
. Box<dyn Any>
can
then be further downcast
into Box<ConcreteType>
where ConcreteType
implements Trait
.source§fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
Convert
Rc<Trait>
(where Trait: Downcast
) to Rc<Any>
. Rc<Any>
can then be
further downcast
into Rc<ConcreteType>
where ConcreteType
implements Trait
.source§fn as_any(&self) -> &(dyn Any + 'static)
fn as_any(&self) -> &(dyn Any + 'static)
Convert
&Trait
(where Trait: Downcast
) to &Any
. This is needed since Rust cannot
generate &Any
’s vtable from &Trait
’s.source§fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
Convert
&mut Trait
(where Trait: Downcast
) to &Any
. This is needed since Rust cannot
generate &mut Any
’s vtable from &mut Trait
’s.