Crate anda_db_tfs

Crate anda_db_tfs 

Source
Expand description

§Anda-DB BM25 Full-Text Search Library

This library implements a full-text search engine based on the BM25 ranking algorithm. BM25 (Best Matching 25) is a ranking function used by search engines to estimate the relevance of documents to a given search query. It’s an extension of the TF-IDF model.

§Features

  • Segment indexing with BM25 scoring
  • Segment removal
  • Query search with top-k results
  • Serialization and deserialization of indices in CBOR format
  • Customizable tokenization

Structs§

BM25Config
Configuration parameters for the BM25 index
BM25Index
BM25 search index with customizable tokenization
BM25Metadata
Index metadata.
BM25Params
Parameters for the BM25 ranking algorithm
BM25Stats
Index statistics.
BoxTokenStream
Simple wrapper of Box<dyn TokenStream + 'a>.
Token
Token
TokenizerChain
A chain of tokenizers and filters implemented Tokenizer trait.
TokenizerChainBuilder
Builder for TokenizerChain

Enums§

BM25Error
Errors that can occur when working with BM25 index.
QueryType
Represents different types of boolean queries that can be parsed from a query string. Supports Term, Or, And, and Not operations for building complex search expressions. Operator precedence: OR < AND < NOT.

Traits§

BoxableTokenizer
A boxable Tokenizer, with its TokenStream type erased.
TokenFilter
Trait for the pluggable components of Tokenizers.
TokenStream
TokenStream is the result of the tokenization.
Tokenizer
Tokenizer are in charge of splitting text into a stream of token before indexing.

Functions§

collect_tokens
Tokenizes text and optionally filters tokens
flat_full_text_search
Performs a simple full-text search by finding matching tokens in a document

Type Aliases§

BoxError
PostingValue
Type alias for posting values: (bucket id, Vec<(document_id, token_frequency)>)