pub struct Codebook { /* private fields */ }Expand description
A Codebook maps token IDs to dense vectors.
Implementations§
Source§impl Codebook
impl Codebook
Source§impl Codebook
impl Codebook
Sourcepub fn encode_ids(&self, ids: &[u32]) -> Vec<f32>
pub fn encode_ids(&self, ids: &[u32]) -> Vec<f32>
Encode a token-id sequence into a single vector using mean pooling.
This is lenient: token IDs not present in the codebook are skipped.
Sourcepub fn encode_ids_strict(&self, ids: &[u32]) -> Result<Vec<f32>>
pub fn encode_ids_strict(&self, ids: &[u32]) -> Result<Vec<f32>>
Encode token IDs into a single vector using mean pooling (strict).
Unlike Self::encode_ids, this returns an error if any token ID is not present in the
codebook. This is useful when you need a “closed vocabulary” contract.
Sourcepub fn encode_ids_weighted_strict(
&self,
ids: &[u32],
weights: &[f32],
) -> Result<Vec<f32>>
pub fn encode_ids_weighted_strict( &self, ids: &[u32], weights: &[f32], ) -> Result<Vec<f32>>
Encode token IDs into a single vector using a weighted mean (strict).
We compute: [ v = \frac{\sum_i w_i , E[t_i]}{\sum_i w_i} ] with the convention that if (\sum_i w_i \le 0), we return the zero vector.
Weighting is one route toward SIF-style baselines (Arora et al., 2017).
Trait Implementations§
Auto Trait Implementations§
impl Freeze for Codebook
impl RefUnwindSafe for Codebook
impl Send for Codebook
impl Sync for Codebook
impl Unpin for Codebook
impl UnsafeUnpin for Codebook
impl UnwindSafe for Codebook
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more