pub struct BatchEncoder { /* private fields */ }Expand description
Batch text encoder: tokenises, embeds, pools, and normalises text.
Implementations§
Source§impl BatchEncoder
impl BatchEncoder
Sourcepub fn new(config: EncodingConfig) -> Self
pub fn new(config: EncodingConfig) -> Self
Create a new encoder with the given configuration.
Sourcepub fn tokenize(&mut self, text: &str) -> TokenizedText
pub fn tokenize(&mut self, text: &str) -> TokenizedText
Tokenise text by splitting on whitespace, truncating to max_length,
and assigning sequential IDs from a growing vocabulary.
Sourcepub fn encode_single(&mut self, text: &str) -> Vec<f32>
pub fn encode_single(&mut self, text: &str) -> Vec<f32>
Encode a single text string into a 128-dimensional embedding.
Steps: tokenise → produce per-token embeddings → pool → optionally normalise.
Sourcepub fn encode_batch(&mut self, texts: &[&str]) -> EncodedBatch
pub fn encode_batch(&mut self, texts: &[&str]) -> EncodedBatch
Encode a slice of text strings in chunks of batch_size.
Sourcepub fn pool(
token_embeddings: Vec<Vec<f32>>,
strategy: &PoolingStrategy,
) -> Vec<f32>
pub fn pool( token_embeddings: Vec<Vec<f32>>, strategy: &PoolingStrategy, ) -> Vec<f32>
Aggregate a list of per-token embedding vectors according to strategy.
Sourcepub fn normalize_l2(v: &mut [f32])
pub fn normalize_l2(v: &mut [f32])
Normalise a vector in-place to unit L2 norm. If the norm is zero, the vector is left unchanged.
Sourcepub fn similarity(a: &[f32], b: &[f32]) -> f64
pub fn similarity(a: &[f32], b: &[f32]) -> f64
Cosine similarity between two embedding vectors. Returns 0.0 if either vector has zero norm.
Sourcepub fn vocab_size(&self) -> usize
pub fn vocab_size(&self) -> usize
Return the number of unique tokens in the vocabulary so far.
Auto Trait Implementations§
impl Freeze for BatchEncoder
impl RefUnwindSafe for BatchEncoder
impl Send for BatchEncoder
impl Sync for BatchEncoder
impl Unpin for BatchEncoder
impl UnsafeUnpin for BatchEncoder
impl UnwindSafe for BatchEncoder
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<T> PolicyExt for Twhere
T: ?Sized,
impl<T> PolicyExt for Twhere
T: ?Sized,
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
The inverse inclusion map: attempts to construct
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
Checks if
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
Use with care! Same as
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
The inclusion map: converts
self to the equivalent element of its superset.