Structs§
- Average
Document Length - Represents the average document length in the corpus.
- Bm25
Vectorizer - The main BM25 vectorizer that converts text into sparse vector representations.
- Bm25
Vectorizer Builder - Builder for creating and configuring a
Bm25Vectorizer. - Length
Normalisation - Controls document length normalisation.
- Mock
Case Preserving Tokenizer - Case-preserving tokenizer
- Mock
Dictionary Token Indexer - Dictionary-based token indexer with interior mutability
- Mock
Hash Token Indexer - Hash-based token indexer
- Mock
Punctuation Tokenizer - Punctuation-aware tokenizer
- Mock
String Token Indexer - String-based token indexer
- Mock
Whitespace Tokenizer - Simple whitespace tokenizer
- Sparse
Representation - A sparse vector representation containing token indices and their BM25 values.
- Term
Frequency Lower Bound - The additional δ parameter for BM25+
- Term
Relevance Saturation - Controls term frequency saturation.
- Token
Index Value - Represents a token with its index and BM25 value.
Traits§
- Bm25
Token Indexer - Trait for mapping tokens to unique indices for efficient BM25 processing.
- Bm25
Tokenizer - Trait for tokenizing text into individual terms for BM25 processing.