Crate tantivy[−][src]

Expand description

`tantivy`

Tantivy is a search engine library. Think Lucene, but in Rust.

// First we need to define a schema ...

// `TEXT` means the field should be tokenized and indexed,
// along with its term frequency and term positions.
//
// `STORED` means that the field will also be saved
// in a compressed, row-oriented key-value store.
// This store is useful to reconstruct the
// documents that were selected during the search phase.
let mut schema_builder = Schema::builder();
let title = schema_builder.add_text_field("title", TEXT | STORED);
let body = schema_builder.add_text_field("body", TEXT);
let schema = schema_builder.build();

// Indexing documents

let index = Index::create_in_dir(index_path, schema.clone())?;

// Here we use a buffer of 100MB that will be split
// between indexing threads.
let mut index_writer = index.writer(100_000_000)?;

// Let's index one documents!
index_writer.add_document(doc!(
    title => "The Old Man and the Sea",
    body => "He was an old man who fished alone in a skiff in \
            the Gulf Stream and he had gone eighty-four days \
            now without taking a fish."
));

// We need to call .commit() explicitly to force the
// index_writer to finish processing the documents in the queue,
// flush the current index to the disk, and advertise
// the existence of new documents.
index_writer.commit()?;

// # Searching

let reader = index.reader()?;

let searcher = reader.searcher();

let query_parser = QueryParser::for_index(&index, vec![title, body]);

// QueryParser may fail if the query is not in the right
// format. For user facing applications, this can be a problem.
// A ticket has been opened regarding this problem.
let query = query_parser.parse_query("sea whale")?;

// Perform search.
// `topdocs` contains the 10 most relevant doc ids, sorted by decreasing scores...
let top_docs: Vec<(Score, DocAddress)> =
    searcher.search(&query, &TopDocs::with_limit(10))?;

for (_score, doc_address) in top_docs {
    // Retrieve the actual content of documents given its `doc_address`.
    let retrieved_doc = searcher.doc(doc_address)?;
    println!("{}", schema.to_json(&retrieved_doc));
}

A good place for you to get started is to check out the example code ( literate programming / source code)

Re-exports

pub use crate::error::TantivyError;

pub use chrono;

Macros

doc

doc! is a shortcut that helps building Document objects.

Structs

DocAddress

DocAddress contains all the necessary information to identify a document given a Searcher object.

Document

Tantivy’s Document is the object that can be indexed and then searched for.

Index

Search Index

IndexBuilder

IndexBuilder can be used to create an index.

IndexMeta

Meta information about the Index.

IndexReader

IndexReader is your entry point to read and search the index.

IndexReaderBuilder

IndexReader builder

IndexSettings

Search Index Settings.

IndexSortByField

Settings to presort the documents in an index

IndexWriter

IndexWriter is the user entry-point to add document to an index.

InvertedIndexReader

The inverted index reader is in charge of accessing the inverted index associated to a specific field.

LeasedItem

A LeasedItem holds an object borrowed from a Pool.

Searcher

Holds a list of SegmentReaders ready for search.

Segment

A segment is a piece of the index.

SegmentId

Uuid identifying a segment.

SegmentMeta

SegmentMeta contains simple meta information about a segment.

SegmentReader

Entry point to access all of the datastructures of the Segment

Snippet

Snippet Contains a fragment of a document, and some highlighed parts inside it.

SnippetGenerator

SnippetGenerator

Term

Term represents the value that the token can take.

Version

Structure version for the index.

Enums

Executor

Search executor whether search request are single thread or multithread.

Order

The order to sort by

ReloadPolicy

Defines when a new version of the index should be reloaded.

SegmentComponent

Enum describing each component of a tantivy segment. Each component is stored in its own file, using the pattern segment_uuid.component_extension, except the delete component that takes an segment_uuid.delete_opstamp.component_extension

UserOperation

UserOperation is an enum type that encapsulates other operation types.

Constants

TERMINATED

Sentinel value returned when a DocSet has been entirely consumed.

Traits

Functions

f64_to_u64

Maps a f64 to u64

i64_to_u64

Maps a i64 to u64

u64_to_f64

Reverse the mapping given by i64_to_u64.

u64_to_i64

Reverse the mapping given by i64_to_u64.

version

Expose the current version of tantivy as found in Cargo.toml during compilation. eg. “0.11.0” as well as the compression scheme used in the docstore.

version_string

Exposes the complete version of tantivy as found in Cargo.toml during compilation as a string. eg. “tantivy v0.11.0, index_format v1, store_compression: lz4”.

Type Definitions

DateTime

Tantivy DateTime

DocId

A u32 identifying a document within a segment. Documents have their DocId assigned incrementally, as they are added in the segment.

Opstamp

A u64 assigned to every operation incrementally

Result

Tantivy result.

Score

A Score that represents the relevance of the document to the query

SegmentOrdinal

A SegmentOrdinal identifies a segment, within a Searcher or Merger.