Crate tantivy[−][src]
Expand description
tantivy
Tantivy is a search engine library.
Think Lucene
, but in Rust.
// First we need to define a schema ...
// `TEXT` means the field should be tokenized and indexed,
// along with its term frequency and term positions.
//
// `STORED` means that the field will also be saved
// in a compressed, row-oriented key-value store.
// This store is useful to reconstruct the
// documents that were selected during the search phase.
let mut schema_builder = Schema::builder();
let title = schema_builder.add_text_field("title", TEXT | STORED);
let body = schema_builder.add_text_field("body", TEXT);
let schema = schema_builder.build();
// Indexing documents
let index = Index::create_in_dir(index_path, schema.clone())?;
// Here we use a buffer of 100MB that will be split
// between indexing threads.
let mut index_writer = index.writer(100_000_000)?;
// Let's index one documents!
index_writer.add_document(doc!(
title => "The Old Man and the Sea",
body => "He was an old man who fished alone in a skiff in \
the Gulf Stream and he had gone eighty-four days \
now without taking a fish."
));
// We need to call .commit() explicitly to force the
// index_writer to finish processing the documents in the queue,
// flush the current index to the disk, and advertise
// the existence of new documents.
index_writer.commit()?;
// # Searching
let reader = index.reader()?;
let searcher = reader.searcher();
let query_parser = QueryParser::for_index(&index, vec![title, body]);
// QueryParser may fail if the query is not in the right
// format. For user facing applications, this can be a problem.
// A ticket has been opened regarding this problem.
let query = query_parser.parse_query("sea whale")?;
// Perform search.
// `topdocs` contains the 10 most relevant doc ids, sorted by decreasing scores...
let top_docs: Vec<(Score, DocAddress)> =
searcher.search(&query, &TopDocs::with_limit(10))?;
for (_score, doc_address) in top_docs {
// Retrieve the actual content of documents given its `doc_address`.
let retrieved_doc = searcher.doc(doc_address)?;
println!("{}", schema.to_json(&retrieved_doc));
}
A good place for you to get started is to check out the example code ( literate programming / source code)
Re-exports
pub use crate::error::TantivyError;
pub use chrono;
Modules
Collectors
WORM directory abstraction.
Definition of Tantivy’s error and result.
Column oriented field storage for tantivy.
The fieldnorm represents the length associated to a given Field of a given document.
Defines tantivy’s merging strategy
Tantivy can (if instructed to do so in the schema) store the term positions in a given field. This positions are expressed as token ordinal. For instance, In “The beauty and the beast”, the term “the” appears in position 0 and position 4. This information is useful to run phrase queries.
Postings module (also called inverted index)
Query Module
Schema definition for tantivy’s indices.
Representations for the space usage of various parts of a Tantivy index.
Compressed/slow/row-oriented storage for documents.
The term dictionary main role is to associate the sorted Term
s to
a TermInfo
struct that contains some meta-information
about the term.
Tokenizer are in charge of chopping text into a stream of tokens ready for indexing.
Macros
doc!
is a shortcut that helps building Document
objects.
Structs
DocAddress
contains all the necessary information
to identify a document given a Searcher
object.
Tantivy’s Document is the object that can be indexed and then searched for.
Search Index
IndexBuilder can be used to create an index.
Meta information about the Index
.
IndexReader
is your entry point to read and search the index.
IndexReader
builder
Search Index Settings.
Settings to presort the documents in an index
IndexWriter
is the user entry-point to add document to an index.
The inverted index reader is in charge of accessing the inverted index associated to a specific field.
A LeasedItem holds an object borrowed from a Pool.
Holds a list of SegmentReader
s ready for search.
A segment is a piece of the index.
Uuid identifying a segment.
SegmentMeta
contains simple meta information about a segment.
Entry point to access all of the datastructures of the Segment
Snippet
Contains a fragment of a document, and some highlighed parts inside it.
SnippetGenerator
Term represents the value that the token can take.
Structure version for the index.
Enums
Search executor whether search request are single thread or multithread.
The order to sort by
Defines when a new version of the index should be reloaded.
Enum describing each component of a tantivy segment.
Each component is stored in its own file,
using the pattern segment_uuid
.component_extension
,
except the delete component that takes an segment_uuid
.delete_opstamp
.component_extension
UserOperation is an enum type that encapsulates other operation types.
Constants
Sentinel value returned when a DocSet has been entirely consumed.
Traits
Write-once read many (WORM) abstraction for where tantivy’s data should be stored.
Represents an iterable set of sorted doc ids.
Has length trait
Postings (also called inverted list)
Functions
Maps a f64
to u64
Maps a i64
to u64
Reverse the mapping given by i64_to_u64
.
Reverse the mapping given by i64_to_u64
.
Expose the current version of tantivy as found in Cargo.toml during compilation. eg. “0.11.0” as well as the compression scheme used in the docstore.
Exposes the complete version of tantivy as found in Cargo.toml during compilation as a string. eg. “tantivy v0.11.0, index_format v1, store_compression: lz4”.
Type Definitions
Tantivy DateTime
A u32
identifying a document within a segment.
Documents have their DocId
assigned incrementally,
as they are added in the segment.
A u64 assigned to every operation incrementally
Tantivy result.
A Score that represents the relevance of the document to the query
A SegmentOrdinal
identifies a segment, within a Searcher
or Merger
.