Crate tantivy[−][src]

`tantivy`

Tantivy is a search engine library. Think Lucene, but in Rust.

#[macro_use]
extern crate tantivy;

// ...

// First we need to define a schema ...

// `TEXT` means the field should be tokenized and indexed,
// along with its term frequency and term positions.
//
// `STORED` means that the field will also be saved
// in a compressed, row-oriented key-value store.
// This store is useful to reconstruct the
// documents that were selected during the search phase.
let mut schema_builder = SchemaBuilder::default();
let title = schema_builder.add_text_field("title", TEXT | STORED);
let body = schema_builder.add_text_field("body", TEXT);
let schema = schema_builder.build();

// Indexing documents

let index = Index::create_in_dir(index_path, schema.clone())?;

// Here we use a buffer of 100MB that will be split
// between indexing threads.
let mut index_writer = index.writer(100_000_000)?;

// Let's index one documents!
index_writer.add_document(doc!(
    title => "The Old Man and the Sea",
    body => "He was an old man who fished alone in a skiff in \
            the Gulf Stream and he had gone eighty-four days \
            now without taking a fish."
));

// We need to call .commit() explicitly to force the
// index_writer to finish processing the documents in the queue,
// flush the current index to the disk, and advertise
// the existence of new documents.
index_writer.commit()?;

// # Searching

index.load_searchers()?;

let searcher = index.searcher();

let query_parser = QueryParser::for_index(&index, vec![title, body]);

// QueryParser may fail if the query is not in the right
// format. For user facing applications, this can be a problem.
// A ticket has been opened regarding this problem.
let query = query_parser.parse_query("sea whale")?;

let mut top_collector = TopCollector::with_limit(10);
searcher.search(&*query, &mut top_collector)?;

// Our top collector now contains the 10
// most relevant doc ids...
let doc_addresses = top_collector.docs();
for doc_address in doc_addresses {
    let retrieved_doc = searcher.doc(&doc_address)?;
    println!("{}", schema.to_json(&retrieved_doc));
}

A good place for you to get started is to check out the example code ( literate programming / source code)

Modules

collector	Defines how the documents matching a search query should be processed.
directory	WORM directory abstraction.
fastfield	Column oriented field storage for tantivy.
fieldnorm	The fieldnorm represents the length associated to a given Field of a given document.
merge_policy	Defines tantivy's merging strategy
postings	Postings module (also called inverted index)
query	Query
schema	Schema definition for tantivy's indices.
store	Compressed/slow/row-oriented storage for documents.
termdict	The term dictionary main role is to associate the sorted `Term`s to a `TermInfo` struct that contains some meta-information about the term.
tokenizer	Tokenizer are in charge of chopping text into a stream of tokens ready for indexing.

Macros

doc	`doc!` is a shortcut that helps building `Document` objects.

Structs

DocAddress	`DocAddress` contains all the necessary information to identify a document given a `Searcher` object.
Document	Tantivy's Document is the object that can be indexed and then searched for.
Error	The Error type.
Index	Search Index
IndexWriter	`IndexWriter` is the user entry-point to add document to an index.
InvertedIndexReader	The inverted index reader is in charge of accessing the inverted index associated to a specific field.
Searcher	Holds a list of `SegmentReader`s ready for search.
Segment	A segment is a piece of the index.
SegmentId	Uuid identifying a segment.
SegmentMeta	`SegmentMeta` contains simple meta information about a segment.
SegmentReader	Entry point to access all of the datastructures of the `Segment`
Term	Term represents the value that the token can take.

Enums

ErrorKind	The kind of an error.
SegmentComponent	Enum describing each component of a tantivy segment. Each component is stored in its own file, using the pattern `segment_uuid`.`component_extension`, except the delete component that takes an `segment_uuid`.`delete_opstamp`.`component_extension`
SkipResult	Expresses the outcome of a call to `DocSet`'s `.skip_next(...)`.

Traits

Directory	Write-once read many (WORM) abstraction for where tantivy's data should be stored.
DocSet	Represents an iterable set of sorted doc ids.
Postings	Postings (also called inverted list)
ResultExt	Additional methods for `Result`, for easy interaction with this crate.

Functions

i64_to_u64	Maps a `i64` to `u64`
u64_to_i64	Reverse the mapping given by `i64_to_u64`.
version	Expose the current version of tantivy, as well whether it was compiled with the simd compression.

Type Definitions

DocId	A `u32` identifying a document within a segment. Documents have their `DocId` assigned incrementally, as they are added in the segment.
Result	Tantivy result.
Score	A f32 that represents the relevance of the document to the query
SegmentLocalId	A `SegmentLocalId` identifies a segment. It only makes sense for a given searcher.