[−][src]Crate tantivy
tantivy
Tantivy is a search engine library.
Think Lucene
, but in Rust.
#[macro_use] extern crate tantivy; // ... // First we need to define a schema ... // `TEXT` means the field should be tokenized and indexed, // along with its term frequency and term positions. // // `STORED` means that the field will also be saved // in a compressed, row-oriented key-value store. // This store is useful to reconstruct the // documents that were selected during the search phase. let mut schema_builder = Schema::builder(); let title = schema_builder.add_text_field("title", TEXT | STORED); let body = schema_builder.add_text_field("body", TEXT); let schema = schema_builder.build(); // Indexing documents let index = Index::create_in_dir(index_path, schema.clone())?; // Here we use a buffer of 100MB that will be split // between indexing threads. let mut index_writer = index.writer(100_000_000)?; // Let's index one documents! index_writer.add_document(doc!( title => "The Old Man and the Sea", body => "He was an old man who fished alone in a skiff in \ the Gulf Stream and he had gone eighty-four days \ now without taking a fish." )); // We need to call .commit() explicitly to force the // index_writer to finish processing the documents in the queue, // flush the current index to the disk, and advertise // the existence of new documents. index_writer.commit()?; // # Searching let reader = index.reader()?; let searcher = reader.searcher(); let query_parser = QueryParser::for_index(&index, vec![title, body]); // QueryParser may fail if the query is not in the right // format. For user facing applications, this can be a problem. // A ticket has been opened regarding this problem. let query = query_parser.parse_query("sea whale")?; // Perform search. // `topdocs` contains the 10 most relevant doc ids, sorted by decreasing scores... let top_docs: Vec<(Score, DocAddress)> = searcher.search(&query, &TopDocs::with_limit(10))?; for (_score, doc_address) in top_docs { // Retrieve the actual content of documents given its `doc_address`. let retrieved_doc = searcher.doc(doc_address)?; println!("{}", schema.to_json(&retrieved_doc)); }
A good place for you to get started is to check out the example code ( literate programming / source code)
Re-exports
pub use chrono; |
Modules
collector | Collectors |
directory | WORM directory abstraction. |
fastfield | Column oriented field storage for tantivy. |
fieldnorm | The fieldnorm represents the length associated to a given Field of a given document. |
merge_policy | Defines tantivy's merging strategy |
postings | Postings module (also called inverted index) |
query | Query |
schema | Schema definition for tantivy's indices. |
space_usage | Representations for the space usage of various parts of a Tantivy index. |
store | Compressed/slow/row-oriented storage for documents. |
termdict | The term dictionary main role is to associate the sorted |
tokenizer | Tokenizer are in charge of chopping text into a stream of tokens ready for indexing. |
Macros
doc |
|
Structs
DocAddress |
|
Document | Tantivy's Document is the object that can be indexed and then searched for. |
Index | Search Index |
IndexMeta | Meta information about the |
IndexReader |
|
IndexReaderBuilder |
|
IndexWriter |
|
InvertedIndexReader | The inverted index reader is in charge of accessing the inverted index associated to a specific field. |
Searcher | Holds a list of |
Segment | A segment is a piece of the index. |
SegmentId | Uuid identifying a segment. |
SegmentMeta |
|
SegmentReader | Entry point to access all of the datastructures of the |
Snippet |
|
SnippetGenerator |
|
Term | Term represents the value that the token can take. |
Enums
Error | The library's failure based error enum |
ReloadPolicy | Defines when a new version of the index should be reloaded. |
SegmentComponent | Enum describing each component of a tantivy segment.
Each component is stored in its own file,
using the pattern |
SkipResult | Expresses the outcome of a call to |
TantivyError | The library's failure based error enum |
Traits
Directory | Write-once read many (WORM) abstraction for where tantivy's data should be stored. |
DocSet | Represents an iterable set of sorted doc ids. |
Postings | Postings (also called inverted list) |
Functions
f64_to_u64 | Maps a |
i64_to_u64 | Maps a |
u64_to_f64 | Reverse the mapping given by |
u64_to_i64 | Reverse the mapping given by |
version | Expose the current version of tantivy, as well whether it was compiled with the simd compression. |
Type Definitions
DateTime | Tantivy DateTime |
DocId | A |
Opstamp | A u64 assigned to every operation incrementally |
Result | Tantivy result. |
Score | A f32 that represents the relevance of the document to the query |
SegmentLocalId | A |