scirs2_text::embeddings::sentence_encoder

Struct SentenceEncoder

pub struct SentenceEncoder { /* private fields */ }

Expand description

Projects sequences of token embeddings to a single sentence-level vector.

Internally the encoder applies:

Pooling — aggregate token embeddings with the chosen strategy.
Projection — a learnable embedding_dim × projection_dim linear layer (bias included) maps the pooled vector to the output space.
Optional L2 normalisation to unit length.

Weights are initialised from a deterministic LCG seeded by seed.

Implementations§

Source §

impl SentenceEncoder

Source

pub fn new( embedding_dim: usize, projection_dim: usize, pooling: PoolingStrategy, seed: u64, ) -> Self

Create a new SentenceEncoder with LCG-initialised weights.

§Parameters

embedding_dim — dimensionality of token embeddings fed to encode.
projection_dim — output dimensionality of sentence embeddings.
pooling — pooling strategy.
seed — deterministic PRNG seed.

Source

pub fn with_normalize(self, normalize: bool) -> Self

Enable or disable L2 normalisation of output embeddings.

Source

pub fn encode(&self, token_embeddings: &[Vec<f64>]) -> Result<Vec<f64>>

Encode a sequence of token embeddings into a single sentence vector.

Returns a Vec<f64> of length projection_dim.

§Errors

Returns an error when token_embeddings is empty or any token embedding has a dimension other than embedding_dim.

Source

pub fn cosine_similarity(a: &[f64], b: &[f64]) -> f64

Cosine similarity between two sentence embeddings.

Returns a value in [-1, 1]. Returns 0.0 when either vector has zero norm.

Source

pub fn similarity_matrix( &self, sentences: &[Vec<Vec<f64>>], ) -> Result<Vec<Vec<f64>>>

Encode multiple sentences and return the n × n cosine-similarity matrix.

Each element of sentences is a Vec<Vec<f64>> (token embeddings for one sentence).

§Errors

Propagates any error from encode.

Source

pub fn normalize(v: &mut [f64])

L2-normalise a vector in place. A zero-norm vector is left unchanged.

Source

pub fn projection_dim(&self) -> usize

The output (projection) dimension.

Source

pub fn embedding_dim(&self) -> usize

The input (token embedding) dimension.

Trait Implementations§

Source §

impl Debug for SentenceEncoder

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl UnwindSafe for SentenceEncoder

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §