Module tantivy::termdict
[−]
[src]
The term dictionary is one of the key datastructure of
tantivy. It associates sorted terms
to their respective
posting list.
The term dictionary makes it possible to iterate through the keys in a sorted manner.
Example
extern crate tantivy; use tantivy::termdict::*; use tantivy::directory::ReadOnlySource; let mut term_dictionary_builder = TermDictionaryBuilderImpl::new(vec!())?; // keys have to be insert in order. term_dictionary_builder.insert("apple", &1u32)?; term_dictionary_builder.insert("grape", &2u32)?; term_dictionary_builder.insert("pear", &3u32)?; let buffer: Vec<u8> = term_dictionary_builder.finish()?; let source = ReadOnlySource::from(buffer); let term_dictionary = TermDictionaryImpl::from_source(source)?; assert_eq!(term_dictionary.get("grape"), Some(2u32));
Implementations
There is currently two implementations of the term dictionary.
Default implementation : fstdict
The default one relies heavily on the fst
crate.
It associate each terms &[u8]
representation to a u64
that is in fact an address in a buffer. The value is then accessible
via deserializing the value at this address.
Stream implementation : streamdict
The fstdict
is a tiny bit slow when streaming all of
the terms.
For some use case (analytics engine), it is preferrable
to use the streamdict
, that offers better streaming
performance, to the detriment of lookup
performance.
streamdict
can be enabled by adding the streamdict
feature when compiling tantivy
.
streamdict
encodes each term relatively to the precedent
as follows.
- number of bytes that needs to be popped.
- number of bytes that needs to be added.
- sequence of bytes that is to be added
- value.
Because such a structure does not allow for lookups,
it comes with a fst
that indexes 1 out of 1024
terms in this structure.
A lookup
therefore consists in a lookup in the fst
followed by a streaming through at most 1024
elements in the
term stream
.
Structs
TermDictionaryBuilderImpl | |
TermDictionaryImpl |
See |
TermMerger |
Given a list of sorted term streams, returns an iterator over sorted unique terms. |
TermStreamerBuilderImpl | |
TermStreamerImpl |
See |
Traits
TermDictionary |
Dictionary associating sorted |
TermDictionaryBuilder |
Builder for the new term dictionary. |
TermStreamer |
|
TermStreamerBuilder |
|