pub struct SentenceEncoder { /* private fields */ }Expand description
Encodes sentences to fixed-length float vectors via word-embedding lookup and pooling.
Words not found in the vocabulary receive an OOV vector (all zeros by default, but they are excluded from mean pooling when all words in the sentence would otherwise be OOV — in that case a zero vector is returned).
Implementations§
Source§impl SentenceEncoder
impl SentenceEncoder
Sourcepub fn new(vocab: &[String], config: SentenceEncoderConfig) -> Self
pub fn new(vocab: &[String], config: SentenceEncoderConfig) -> Self
Create a SentenceEncoder with randomly initialised embeddings for
every word in vocab.
Embeddings are initialised deterministically from a seeded LCG so that results are reproducible without importing any RNG crate.
Sourcepub fn from_vectors(
vectors: HashMap<String, Vec<f32>>,
config: SentenceEncoderConfig,
) -> Self
pub fn from_vectors( vectors: HashMap<String, Vec<f32>>, config: SentenceEncoderConfig, ) -> Self
Create a SentenceEncoder from a pre-built token-to-vector map.
All vectors must have the same length, which must equal
config.embedding_dim. If the map is empty the encoder still works
but will return zero vectors for every sentence.
Sourcepub fn encode(&self, sentence: &str) -> Vec<f32>
pub fn encode(&self, sentence: &str) -> Vec<f32>
Encode a single sentence to a fixed-length Vec<f32>.
The sentence is split on whitespace (after lower-casing). Tokens
beyond max_seq_len are dropped. Words not found in the vocabulary
are ignored (treated as if absent) in mean/weighted-mean pooling.
For max pooling, missing words contribute a zero vector.
Sourcepub fn similarity(&self, a: &[f32], b: &[f32]) -> f32
pub fn similarity(&self, a: &[f32], b: &[f32]) -> f32
Cosine similarity between two embedding vectors.
Returns a value in [-1.0, 1.0], or 0.0 when either vector has zero
norm.
Sourcepub fn most_similar<'a>(
&self,
query: &str,
sentences: &[&'a str],
top_k: usize,
) -> Vec<(&'a str, f32)>
pub fn most_similar<'a>( &self, query: &str, sentences: &[&'a str], top_k: usize, ) -> Vec<(&'a str, f32)>
Find the top_k sentences most similar to query (by cosine
similarity), returned in descending similarity order.
Sourcepub fn embedding_dim(&self) -> usize
pub fn embedding_dim(&self) -> usize
Return the embedding dimensionality.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for SentenceEncoder
impl RefUnwindSafe for SentenceEncoder
impl Send for SentenceEncoder
impl Sync for SentenceEncoder
impl Unpin for SentenceEncoder
impl UnsafeUnpin for SentenceEncoder
impl UnwindSafe for SentenceEncoder
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.