Crate bm25_vectorizer

Crate bm25_vectorizer 

Source

Structs§

AverageDocumentLength
Represents the average document length in the corpus.
Bm25Vectorizer
The main BM25 vectorizer that converts text into sparse vector representations.
Bm25VectorizerBuilder
Builder for creating and configuring a Bm25Vectorizer.
LengthNormalisation
Controls document length normalisation.
MockCasePreservingTokenizer
Case-preserving tokenizer
MockDictionaryTokenIndexer
Dictionary-based token indexer with interior mutability
MockHashTokenIndexer
Hash-based token indexer
MockPunctuationTokenizer
Punctuation-aware tokenizer
MockStringTokenIndexer
String-based token indexer
MockWhitespaceTokenizer
Simple whitespace tokenizer
SparseRepresentation
A sparse vector representation containing token indices and their BM25 values.
TermFrequencyLowerBound
The additional δ parameter for BM25+
TermRelevanceSaturation
Controls term frequency saturation.
TokenIndexValue
Represents a token with its index and BM25 value.

Traits§

Bm25TokenIndexer
Trait for mapping tokens to unique indices for efficient BM25 processing.
Bm25Tokenizer
Trait for tokenizing text into individual terms for BM25 processing.