Crate anda_db_tfs

Expand description

§Anda-DB BM25 Full-Text Search Library

This library implements a full-text search engine based on the BM25 ranking algorithm. BM25 (Best Matching 25) is a ranking function used by search engines to estimate the relevance of documents to a given search query. It’s an extension of the TF-IDF model.

§Features

Segment indexing with BM25 scoring
Segment removal
Query search with top-k results
Serialization and deserialization of indices in CBOR format
Customizable tokenization

Structs§

BM25Config: Configuration parameters for the BM25 index
BM25Index: BM25 search index with customizable tokenization
BM25Metadata: Index metadata.
BM25Params: Parameters for the BM25 ranking algorithm
BM25Stats: Index statistics.
BoxTokenStream: Simple wrapper of Box<dyn TokenStream + 'a>.
Token: Token
TokenizerChain: A chain of tokenizers and filters implemented Tokenizer trait.
TokenizerChainBuilder: Builder for TokenizerChain

Enums§

BM25Error: Errors that can occur when working with BM25 index.
QueryType: Represents different types of boolean queries that can be parsed from a query string. Supports Term, Or, And, and Not operations for building complex search expressions. Operator precedence: OR < AND < NOT.

Traits§

BoxableTokenizer: A boxable Tokenizer, with its TokenStream type erased.
TokenFilter: Trait for the pluggable components of Tokenizers.
TokenStream: TokenStream is the result of the tokenization.
Tokenizer: Tokenizer are in charge of splitting text into a stream of token before indexing.

Functions§

collect_tokens: Tokenizes text and optionally filters tokens
flat_full_text_search: Performs a simple full-text search by finding matching tokens in a document

Type Aliases§

BoxError
PostingValue: Type alias for posting values: (bucket id, Vec<(document_id, token_frequency)>)

Crate anda_db_tfs

Crate anda_db_tfs Copy item path

§Anda-DB BM25 Full-Text Search Library

§Features

Structs§

Enums§

Traits§

Functions§

Type Aliases§

Crate anda_db_tfs