Crate tantivy [−] [src]
tantivy
Tantivy is a search engine library.
Think Lucene
, but in Rust.
#[macro_use] extern crate tantivy; // ... // First we need to define a schema ... // `TEXT` means the field should be tokenized and indexed, // along with its term frequency and term positions. // // `STORED` means that the field will also be saved // in a compressed, row-oriented key-value store. // This store is useful to reconstruct the // documents that were selected during the search phase. let mut schema_builder = SchemaBuilder::default(); let title = schema_builder.add_text_field("title", TEXT | STORED); let body = schema_builder.add_text_field("body", TEXT); let schema = schema_builder.build(); // Indexing documents let index = Index::create(index_path, schema.clone())?; // Here we use a buffer of 100MB that will be split // between indexing threads. let mut index_writer = index.writer(100_000_000)?; // Let's index one documents! index_writer.add_document(doc!( title => "The Old Man and the Sea", body => "He was an old man who fished alone in a skiff in \ the Gulf Stream and he had gone eighty-four days \ now without taking a fish." )); // We need to call .commit() explicitly to force the // index_writer to finish processing the documents in the queue, // flush the current index to the disk, and advertise // the existence of new documents. index_writer.commit()?; // # Searching index.load_searchers()?; let searcher = index.searcher(); let query_parser = QueryParser::for_index(&index, vec![title, body]); // QueryParser may fail if the query is not in the right // format. For user facing applications, this can be a problem. // A ticket has been opened regarding this problem. let query = query_parser.parse_query("sea whale")?; let mut top_collector = TopCollector::with_limit(10); searcher.search(&*query, &mut top_collector)?; // Our top collector now contains the 10 // most relevant doc ids... let doc_addresses = top_collector.docs(); for doc_address in doc_addresses { let retrieved_doc = searcher.doc(&doc_address)?; println!("{}", schema.to_json(&retrieved_doc)); }
A good place for you to get started is to check out the example code ( literate programming / source code)
Modules
collector |
Defines how the documents matching a search query should be processed. |
directory |
WORM directory abstraction. |
fastfield |
Column oriented field storage for tantivy. |
merge_policy |
Defines tantivy's merging strategy |
postings |
Postings module (also called inverted index) |
query |
Query |
schema |
Schema definition for tantivy's indices. |
store |
Compressed/slow/row-oriented storage for documents. |
termdict |
The term dictionary is one of the key datastructure of
tantivy. It associates sorted |
tokenizer |
Tokenizer are in charge of chopping text into a stream of tokens ready for indexing. |
Macros
doc |
|
Structs
DocAddress |
|
Document |
Tantivy's Document is the object that can be indexed and then searched for. |
Error |
The Error type. |
Index |
Search Index |
IndexWriter |
|
InvertedIndexReader |
The inverted index reader is in charge of accessing the inverted index associated to a specific field. |
Searcher |
Holds a list of |
Segment |
A segment is a piece of the index. |
SegmentId |
Uuid identifying a segment. |
SegmentMeta |
|
SegmentReader |
Entry point to access all of the datastructures of the |
Term |
Term represents the value that the token can take. |
TimerTree |
Timer tree |
Enums
ErrorKind |
The kind of an error. |
SegmentComponent |
Enum describing each component of a tantivy segment.
Each component is stored in its own file,
using the pattern |
SkipResult |
Expresses the outcome of a call to |
Traits
Directory |
Write-once read many (WORM) abstraction for where tantivy's data should be stored. |
DocSet |
Represents an iterable set of sorted doc ids. |
Postings |
Postings (also called inverted list) |
ResultExt |
Additional methods for |
Functions
i64_to_u64 |
Maps a |
u64_to_i64 |
Reverse the mapping given by |
version |
Expose the current version of tantivy, as well whether it was compiled with the simd compression. |
Type Definitions
DocId |
A |
Result |
Tantivy result. |
Score |
A f32 that represents the relevance of the document to the query |
SegmentLocalId |
A |