pub struct EmbedderBuilder<D = u32, T = DefaultTokenizer> { /* private fields */ }Expand description
A consuming builder for Embedder.
Implementations§
Source§impl<D, T> EmbedderBuilder<D, T>
impl<D, T> EmbedderBuilder<D, T>
Sourcepub fn with_avgdl(avgdl: f32) -> EmbedderBuilder<D, T>where
T: Default,
pub fn with_avgdl(avgdl: f32) -> EmbedderBuilder<D, T>where
T: Default,
Constructs a new EmbedderBuilder with the given average document length. Use this if you
know the average document length in advance. If you don’t, but you have your full corpus
ahead of time, use with_fit_to_corpus or with_tokenizer_and_fit_to_corpus instead.
If you have neither the full corpus nor a sample of it, you can configure the embedder to
disregard document length by setting b to 0.0. In this case, it doesn’t matter what
value you pass to with_avgdl.
The average document length is the average number of tokens in a document from your corpus;
if you need access to this value, you can construct an Embedder and call avgdl on it.
Sourcepub fn with_tokenizer_and_fit_to_corpus(
tokenizer: T,
corpus: &[&str],
) -> EmbedderBuilder<D, T>
pub fn with_tokenizer_and_fit_to_corpus( tokenizer: T, corpus: &[&str], ) -> EmbedderBuilder<D, T>
Constructs a new EmbedderBuilder with its average document length fit to the given corpus.
Use this if you have the full corpus (or a sample of it) available in advance. The embedder
will assume the given tokenizer. Use the parallelism feature to speed the fitting process
up for large corpora.
Sourcepub fn k1(self, k1: f32) -> EmbedderBuilder<D, T>
pub fn k1(self, k1: f32) -> EmbedderBuilder<D, T>
Sets the k1 parameter for the embedder. The default value is 1.2.
Sourcepub fn b(self, b: f32) -> EmbedderBuilder<D, T>
pub fn b(self, b: f32) -> EmbedderBuilder<D, T>
Sets the b parameter for the embedder. The default value is 0.75.
Sourcepub fn avgdl(self, avgdl: f32) -> EmbedderBuilder<D, T>
pub fn avgdl(self, avgdl: f32) -> EmbedderBuilder<D, T>
Overrides the average document length for the embedder.
Sourcepub fn tokenizer(self, tokenizer: T) -> EmbedderBuilder<D, T>
pub fn tokenizer(self, tokenizer: T) -> EmbedderBuilder<D, T>
Sets the tokenizer for the embedder.
Source§impl<D> EmbedderBuilder<D, DefaultTokenizer>
impl<D> EmbedderBuilder<D, DefaultTokenizer>
Sourcepub fn with_fit_to_corpus(
language_mode: impl Into<LanguageMode>,
corpus: &[&str],
) -> EmbedderBuilder<D, DefaultTokenizer>
pub fn with_fit_to_corpus( language_mode: impl Into<LanguageMode>, corpus: &[&str], ) -> EmbedderBuilder<D, DefaultTokenizer>
Constructs a new EmbedderBuilder with its average document length fit to the given corpus.
Use this if you have the full corpus (or a sample of it) available in advance. This
function uses the default tokenizer configured with the input language mode. The embedder
will assume this tokenizer. Use the parallelism feature to speed the fitting process up
for large corpora.
Sourcepub fn language_mode(
self,
language_mode: impl Into<LanguageMode>,
) -> EmbedderBuilder<D, DefaultTokenizer>
pub fn language_mode( self, language_mode: impl Into<LanguageMode>, ) -> EmbedderBuilder<D, DefaultTokenizer>
Sets the language mode for the embedder tokenizer.
Auto Trait Implementations§
impl<D, T> Freeze for EmbedderBuilder<D, T>where
T: Freeze,
impl<D, T> RefUnwindSafe for EmbedderBuilder<D, T>where
T: RefUnwindSafe,
D: RefUnwindSafe,
impl<D, T> Send for EmbedderBuilder<D, T>
impl<D, T> Sync for EmbedderBuilder<D, T>
impl<D, T> Unpin for EmbedderBuilder<D, T>
impl<D, T> UnwindSafe for EmbedderBuilder<D, T>where
T: UnwindSafe,
D: UnwindSafe,
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more