Crate summavy

Expand description

§`tantivy`

Tantivy is a search engine library. Think Lucene, but in Rust.

// First we need to define a schema ...

// `TEXT` means the field should be tokenized and indexed,
// along with its term frequency and term positions.
//
// `STORED` means that the field will also be saved
// in a compressed, row-oriented key-value store.
// This store is useful to reconstruct the
// documents that were selected during the search phase.
let mut schema_builder = Schema::builder();
let title = schema_builder.add_text_field("title", TEXT | STORED);
let body = schema_builder.add_text_field("body", TEXT);
let schema = schema_builder.build();

// Indexing documents

let index = Index::create_in_dir(index_path, schema.clone())?;

// Here we use a buffer of 100MB that will be split
// between indexing threads.
let mut index_writer = index.writer(100_000_000)?;

// Let's index one documents!
index_writer.add_document(doc!(
    title => "The Old Man and the Sea",
    body => "He was an old man who fished alone in a skiff in \
            the Gulf Stream and he had gone eighty-four days \
            now without taking a fish."
))?;

// We need to call .commit() explicitly to force the
// index_writer to finish processing the documents in the queue,
// flush the current index to the disk, and advertise
// the existence of new documents.
index_writer.commit()?;

// # Searching

let reader = index.reader()?;

let searcher = reader.searcher();

let query_parser = QueryParser::for_index(&index, vec![title, body]);

// QueryParser may fail if the query is not in the right
// format. For user facing applications, this can be a problem.
// A ticket has been opened regarding this problem.
let query = query_parser.parse_query("sea whale")?;

// Perform search.
// `topdocs` contains the 10 most relevant doc ids, sorted by decreasing scores...
let top_docs: Vec<(Score, DocAddress)> =
    searcher.search(&query, &TopDocs::with_limit(10))?;

for (_score, doc_address) in top_docs {
    // Retrieve the actual content of documents given its `doc_address`.
    let retrieved_doc = searcher.doc(doc_address)?;
    println!("{}", schema.to_json(&retrieved_doc));
}

A good place for you to get started is to check out the example code ( literate programming / source code)

Re-exports§

pub use crate::error::TantivyError;
pub use crate::directory::Directory;
pub use crate::postings::Postings;
pub use crate::schema::DateOptions;
pub use crate::schema::DatePrecision;
pub use crate::schema::Document;
pub use crate::schema::Term;
pub use time;

Modules§

aggregation: Aggregations
collector: Collectors
directory: WORM (Write Once Read Many) directory abstraction.
error: Definition of Tantivy’s errors and results.
fastfield: Column oriented field storage for tantivy.
fieldnorm: The fieldnorm represents the length associated with a given Field of a given document.
merge_policy: Defines tantivy’s merging strategy
positions: Tantivy can (if instructed to do so in the schema) store the term positions in a given field. This position is expressed as token ordinal. For instance, In “The beauty and the beast”, the term “the” appears in position 0 and position 3. This information is useful to run phrase queries.
postings: Postings module (also called inverted index)
query: Module containing the different query implementations.
schema: Schema definition for tantivy’s indices.
space_usage: Representations for the space usage of various parts of a Tantivy index.
store: Compressed/slow/row-oriented storage for documents.
termdict: The term dictionary main role is to associate the sorted Terms to a TermInfo struct that contains some meta-information about the term.
tokenizer: Tokenizer are in charge of chopping text into a stream of tokens ready for indexing.

Macros§

doc: doc! is a shortcut that helps building Document objects.

Structs§

DateTime: A date/time value with microsecond precision.
DemuxMapping: DemuxMapping can be used to reorganize data from multiple segments.
DocAddress: DocAddress contains all the necessary information to identify a document given a Searcher object.
DocIdToSegmentOrdinal: DocIdToSegmentOrdinal maps from doc_id within a segment to the new segment ordinal for demuxing.
FutureResult: FutureResult is a handle that makes it possible to wait for the completion of an ongoing task.
Index: Search Index
IndexBuilder: IndexBuilder can be used to create an index.
IndexMeta: Meta information about the Index.
IndexReader: IndexReader is your entry point to read and search the index.
IndexReaderBuilder: IndexReader builder
IndexSettings: Search Index Settings.
IndexSortByField: Settings to presort the documents in an index
IndexWriter: IndexWriter is the user entry-point to add document to an index.
Inventory: The Inventory register and keeps track of all of the objects alive.
InvertedIndexReader: The inverted index reader is in charge of accessing the inverted index associated with a specific field.
PreparedCommit: A prepared commit
Searcher: Holds a list of SegmentReaders ready for search.
SearcherGeneration: Identifies the searcher generation accessed by a Searcher.
Segment: A segment is a piece of the index.
SegmentId: Uuid identifying a segment.
SegmentMeta: SegmentMeta contains simple meta information about a segment.
SegmentReader: Entry point to access all of the datastructures of the Segment
Snippet: Snippet Contains a fragment of a document, and some highlighted parts inside it.
SnippetGenerator: SnippetGenerator
TrackedObject: Your tracked object.
Version: Structure version for the index.

Enums§

Executor: Search executor whether search request are single thread or multithread.
Order: The order to sort by
ReloadPolicy: Defines when a new version of the index should be reloaded.
SegmentComponent: Enum describing each component of a tantivy segment. Each component is stored in its own file, using the pattern segment_uuid.component_extension, except the delete component that takes an segment_uuid.delete_opstamp.component_extension
UserOperation: UserOperation is an enum type that encapsulates other operation types.

Constants§

TERMINATED: Sentinel value returned when a DocSet has been entirely consumed.

Traits§

DocSet: Represents an iterable set of sorted doc ids.
HasLen: Has length trait
SegmentAttributesMerger: Allows to implement custom behaviour while merging SegmentAttributes of multiple segments.
Warmer: Warmer can be used to maintain segment-level state e.g. caches.

Functions§

demux: Demux the segments according to demux_mapping. See DemuxMapping. The number of output_directories need to match max new segment ordinal from demux_mapping.
f64_to_u64: Maps a f64 to u64
i64_to_u64: Maps a i64 to u64
u64_to_f64: Reverse the mapping given by f64_to_u64().
u64_to_i64: Reverse the mapping given by i64_to_u64().
version: Expose the current version of tantivy as found in Cargo.toml during compilation. eg. “0.11.0” as well as the compression scheme used in the docstore.
version_string: Exposes the complete version of tantivy as found in Cargo.toml during compilation as a string. eg. “tantivy v0.11.0, index_format v1, store_compression: lz4”.

Type Aliases§

DocId: A u32 identifying a document within a segment. Documents have their DocId assigned incrementally, as they are added in the segment.
Opstamp: A u64 assigned to every operation incrementally
Result: Tantivy result.
Score: A Score that represents the relevance of the document to the query
SegmentOrdinal: A SegmentOrdinal identifies a segment, within a Searcher or Merger.

Crate summavyCopy item path

§tantivy

Re-exports§

Modules§

Macros§

Structs§

Enums§

Constants§

Traits§

Functions§

Type Aliases§

Crate summavy

§`tantivy`