Expand description
§Anda-DB BM25 Full-Text Search Library
This library implements a full-text search engine based on the BM25 ranking algorithm. BM25 (Best Matching 25) is a ranking function used by search engines to estimate the relevance of documents to a given search query. It’s an extension of the TF-IDF model.
§Features
- Segment indexing with BM25 scoring
- Segment removal
- Query search with top-k results
- Serialization and deserialization of indices in CBOR format
- Customizable tokenization
Structs§
- BM25
Config - Configuration parameters for the BM25 index
- BM25
Index - BM25 search index with customizable tokenization
- BM25
Metadata - Index metadata.
- BM25
Params - Parameters for the BM25 ranking algorithm
- BM25
Stats - Index statistics.
- BoxToken
Stream - Simple wrapper of
Box<dyn TokenStream + 'a>
. - Token
- Token
- Tokenizer
Chain - A chain of tokenizers and filters implemented Tokenizer trait.
- Tokenizer
Chain Builder - Builder for
TokenizerChain
Enums§
- BM25
Error - Errors that can occur when working with BM25 index.
- Query
Type - Represents different types of boolean queries that can be parsed from a query string. Supports Term, Or, And, and Not operations for building complex search expressions. Operator precedence: OR < AND < NOT.
Traits§
- Boxable
Tokenizer - A boxable
Tokenizer
, with itsTokenStream
type erased. - Token
Filter - Trait for the pluggable components of
Tokenizer
s. - Token
Stream TokenStream
is the result of the tokenization.- Tokenizer
Tokenizer
are in charge of splitting text into a stream of token before indexing.
Functions§
- collect_
tokens - Tokenizes text and optionally filters tokens
- flat_
full_ text_ search - Performs a simple full-text search by finding matching tokens in a document
Type Aliases§
- BoxError
- Posting
Value - Type alias for posting values: (bucket id, Vec<(document_id, token_frequency)>)