Bm25VectorizerBuilder

Struct Bm25VectorizerBuilder 

Source
pub struct Bm25VectorizerBuilder<TokenIndexer, Tokenizer> { /* private fields */ }
Expand description

Builder for creating and configuring a Bm25Vectorizer.

It supports fitting on a corpus to automatically compute the average document length, and validates all parameters before building.

§Type Parameters

  • TokenIndexer: Implementation of Bm25TokenIndexer trait
  • Tokenizer: Implementation of Bm25Tokenizer trait

§Examples

Basic usage with manual avgdl:

use bm25_vectorizer::{Bm25VectorizerBuilder, MockWhitespaceTokenizer, MockHashTokenIndexer};

let vectorizer = Bm25VectorizerBuilder::new()
    .tokenizer(MockWhitespaceTokenizer)
    .token_indexer(MockHashTokenIndexer)
    .k1(1.2)
    .b(0.75)
    .avgdl(10.0)
    .build()?;

Usage with corpus fitting:

use bm25_vectorizer::{Bm25VectorizerBuilder, MockWhitespaceTokenizer, MockHashTokenIndexer};

let corpus = vec!["hello world", "world of rust", "hello rust programming"];
let vectorizer = Bm25VectorizerBuilder::new()
    .tokenizer(MockWhitespaceTokenizer)
    .token_indexer(MockHashTokenIndexer)
    .k1(1.2)
    .b(0.75)
    .fit(&corpus)?  // Automatically computes avgdl
    .build()?;

Implementations§

Source§

impl<TokenIndexer, Tokenizer> Bm25VectorizerBuilder<TokenIndexer, Tokenizer>

Source

pub fn new() -> Self

Source

pub fn k1(self, k1: f32) -> Self

Source

pub fn b(self, b: f32) -> Self

Source

pub fn delta(self, delta: f32) -> Self

Source

pub fn avgdl(self, avgdl: f32) -> Self

Source

pub fn tokenizer(self, tokenizer: Tokenizer) -> Self

Source

pub fn token_indexer(self, token_indexer: TokenIndexer) -> Self

Source

pub fn fit(self, corpus: &[&str]) -> Result<Self, Bm25VectorizerError>
where Tokenizer: Bm25Tokenizer + Sync,

Source

pub fn fit_iter<I, S>(self, corpus: I) -> Result<Self, Bm25VectorizerError>
where I: IntoIterator<Item = S>, S: AsRef<str>, Tokenizer: Bm25Tokenizer + Sync,

Source

pub fn build( self, ) -> Result<Bm25Vectorizer<TokenIndexer, Tokenizer>, Bm25VectorizerError>

Auto Trait Implementations§

§

impl<TokenIndexer, Tokenizer> Freeze for Bm25VectorizerBuilder<TokenIndexer, Tokenizer>
where Tokenizer: Freeze, TokenIndexer: Freeze,

§

impl<TokenIndexer, Tokenizer> RefUnwindSafe for Bm25VectorizerBuilder<TokenIndexer, Tokenizer>
where Tokenizer: RefUnwindSafe, TokenIndexer: RefUnwindSafe,

§

impl<TokenIndexer, Tokenizer> Send for Bm25VectorizerBuilder<TokenIndexer, Tokenizer>
where Tokenizer: Send, TokenIndexer: Send,

§

impl<TokenIndexer, Tokenizer> Sync for Bm25VectorizerBuilder<TokenIndexer, Tokenizer>
where Tokenizer: Sync, TokenIndexer: Sync,

§

impl<TokenIndexer, Tokenizer> Unpin for Bm25VectorizerBuilder<TokenIndexer, Tokenizer>
where Tokenizer: Unpin, TokenIndexer: Unpin,

§

impl<TokenIndexer, Tokenizer> UnwindSafe for Bm25VectorizerBuilder<TokenIndexer, Tokenizer>
where Tokenizer: UnwindSafe, TokenIndexer: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.