Module tantivy::termdict [] [src]

The term dictionary is one of the key datastructure of tantivy. It associates sorted terms to their respective posting list.

The term dictionary makes it possible to iterate through the keys in a sorted manner.

Example

extern crate tantivy;
use tantivy::termdict::*;
use tantivy::directory::ReadOnlySource;

let mut term_dictionary_builder = TermDictionaryBuilderImpl::new(vec!())?;

// keys have to be insert in order.
term_dictionary_builder.insert("apple", &1u32)?;
term_dictionary_builder.insert("grape", &2u32)?;
term_dictionary_builder.insert("pear", &3u32)?;
let buffer: Vec<u8> = term_dictionary_builder.finish()?;

let source = ReadOnlySource::from(buffer);
let term_dictionary = TermDictionaryImpl::from_source(source)?;

assert_eq!(term_dictionary.get("grape"), Some(2u32));

Implementations

There is currently two implementations of the term dictionary.

Default implementation : fstdict

The default one relies heavily on the fst crate. It associate each terms &[u8] representation to a u64 that is in fact an address in a buffer. The value is then accessible via deserializing the value at this address.

Stream implementation : streamdict

The fstdict is a tiny bit slow when streaming all of the terms. For some use case (analytics engine), it is preferrable to use the streamdict, that offers better streaming performance, to the detriment of lookup performance.

streamdict can be enabled by adding the streamdict feature when compiling tantivy.

streamdict encodes each term relatively to the precedent as follows.

  • number of bytes that needs to be popped.
  • number of bytes that needs to be added.
  • sequence of bytes that is to be added
  • value.

Because such a structure does not allow for lookups, it comes with a fst that indexes 1 out of 1024 terms in this structure.

A lookup therefore consists in a lookup in the fst followed by a streaming through at most 1024 elements in the term stream.

Structs

TermDictionaryBuilderImpl

See TermDictionaryBuilder

TermDictionaryImpl

See TermDictionary

TermMerger

Given a list of sorted term streams, returns an iterator over sorted unique terms.

TermStreamerBuilderImpl

See TermStreamerBuilder

TermStreamerImpl

See TermStreamer

Traits

TermDictionary

Dictionary associating sorted &[u8] to values

TermDictionaryBuilder

Builder for the new term dictionary.

TermStreamer

TermStreamer acts as a cursor over a range of terms of a segment. Terms are guaranteed to be sorted.

TermStreamerBuilder

TermStreamerBuilder is an helper object used to define a range of terms that should be streamed.