Crate tantivy[−][src]
tantivy
Tantivy is a search engine library.
Think Lucene
, but in Rust.
// First we need to define a schema ... // `TEXT` means the field should be tokenized and indexed, // along with its term frequency and term positions. // // `STORED` means that the field will also be saved // in a compressed, row-oriented key-value store. // This store is useful to reconstruct the // documents that were selected during the search phase. let mut schema_builder = Schema::builder(); let title = schema_builder.add_text_field("title", TEXT | STORED); let body = schema_builder.add_text_field("body", TEXT); let schema = schema_builder.build(); // Indexing documents let index = Index::create_in_dir(index_path, schema.clone())?; // Here we use a buffer of 100MB that will be split // between indexing threads. let mut index_writer = index.writer(100_000_000)?; // Let's index one documents! index_writer.add_document(doc!( title => "The Old Man and the Sea", body => "He was an old man who fished alone in a skiff in \ the Gulf Stream and he had gone eighty-four days \ now without taking a fish." )); // We need to call .commit() explicitly to force the // index_writer to finish processing the documents in the queue, // flush the current index to the disk, and advertise // the existence of new documents. index_writer.commit()?; // # Searching let reader = index.reader()?; let searcher = reader.searcher(); let query_parser = QueryParser::for_index(&index, vec![title, body]); // QueryParser may fail if the query is not in the right // format. For user facing applications, this can be a problem. // A ticket has been opened regarding this problem. let query = query_parser.parse_query("sea whale")?; // Perform search. // `topdocs` contains the 10 most relevant doc ids, sorted by decreasing scores... let top_docs: Vec<(Score, DocAddress)> = searcher.search(&query, &TopDocs::with_limit(10))?; for (_score, doc_address) in top_docs { // Retrieve the actual content of documents given its `doc_address`. let retrieved_doc = searcher.doc(doc_address)?; println!("{}", schema.to_json(&retrieved_doc)); }
A good place for you to get started is to check out the example code ( literate programming / source code)
Re-exports
pub use crate::error::TantivyError; |
pub use chrono; |
Modules
collector | Collectors |
directory | WORM directory abstraction. |
error | Definition of Tantivy's error and result. |
fastfield | Column oriented field storage for tantivy. |
fieldnorm | The fieldnorm represents the length associated to a given Field of a given document. |
merge_policy | Defines tantivy's merging strategy |
postings | Postings module (also called inverted index) |
query | Query Module |
schema | Schema definition for tantivy's indices. |
space_usage | Representations for the space usage of various parts of a Tantivy index. |
store | Compressed/slow/row-oriented storage for documents. |
termdict | The term dictionary main role is to associate the sorted |
tokenizer | Tokenizer are in charge of chopping text into a stream of tokens ready for indexing. |
Macros
doc |
|
Structs
DocAddress |
|
Document | Tantivy's Document is the object that can be indexed and then searched for. |
Index | Search Index |
IndexMeta | Meta information about the |
IndexReader |
|
IndexReaderBuilder |
|
IndexWriter |
|
InvertedIndexReader | The inverted index reader is in charge of accessing the inverted index associated to a specific field. |
LeasedItem | A LeasedItem holds an object borrowed from a Pool. |
Searcher | Holds a list of |
Segment | A segment is a piece of the index. |
SegmentId | Uuid identifying a segment. |
SegmentMeta |
|
SegmentReader | Entry point to access all of the datastructures of the |
Snippet |
|
SnippetGenerator |
|
Term | Term represents the value that the token can take. |
Version | Structure version for the index. |
Enums
Executor | Search executor whether search request are single thread or multithread. |
ReloadPolicy | Defines when a new version of the index should be reloaded. |
SegmentComponent | Enum describing each component of a tantivy segment.
Each component is stored in its own file,
using the pattern |
UserOperation | UserOperation is an enum type that encapsulates other operation types. |
Constants
TERMINATED | Sentinel value returned when a DocSet has been entirely consumed. |
Traits
Directory | Write-once read many (WORM) abstraction for where tantivy's data should be stored. |
DocSet | Represents an iterable set of sorted doc ids. |
HasLen | Has length trait |
Postings | Postings (also called inverted list) |
Functions
f64_to_u64 | Maps a |
i64_to_u64 | Maps a |
u64_to_f64 | Reverse the mapping given by |
u64_to_i64 | Reverse the mapping given by |
version | Expose the current version of tantivy as found in Cargo.toml during compilation. eg. "0.11.0" as well as the compression scheme used in the docstore. |
version_string | Exposes the complete version of tantivy as found in Cargo.toml during compilation as a string. eg. "tantivy v0.11.0, index_format v1, store_compression: lz4". |
Type Definitions
DateTime | Tantivy DateTime |
DocId | A |
Opstamp | A u64 assigned to every operation incrementally |
Result | Tantivy result. |
Score | A Score that represents the relevance of the document to the query |
SegmentLocalId | A |