pub struct Embeddings<V, S> { /* private fields */ }
Expand description

Word embeddings.

This data structure stores word embeddings (also known as word vectors) and provides some useful methods on the embeddings, such as similarity and analogy queries.

Implementations§

source§

impl<V, S> Embeddings<V, S>where V: Vocab, S: Storage,

source

pub fn new( metadata: Option<Metadata>, vocab: V, storage: S, norms: NdNorms ) -> Self

Construct an embeddings from a vocabulary, storage, and norms.

The embeddings for known words must be normalized. However, this is not verified due to the high computational cost.

source§

impl<V, S> Embeddings<V, S>

source

pub fn into_parts(self) -> (Option<Metadata>, V, S, Option<NdNorms>)

Decompose embeddings in its vocabulary, storage, and optionally norms.

source

pub fn metadata(&self) -> Option<&Metadata>

Get metadata.

source

pub fn metadata_mut(&mut self) -> Option<&mut Metadata>

Get metadata mutably.

source

pub fn norms(&self) -> Option<&NdNorms>

Get embedding norms.

source

pub fn set_metadata(&mut self, metadata: Option<Metadata>) -> Option<Metadata>

Set metadata.

Returns the previously-stored metadata.

source

pub fn storage(&self) -> &S

Get the embedding storage.

source

pub fn vocab(&self) -> &V

Get the vocabulary.

source§

impl<V, S> Embeddings<V, S>where V: Vocab, S: Storage,

source

pub fn dims(&self) -> usize

Return the length (in vector components) of the word embeddings.

source

pub fn embedding(&self, word: &str) -> Option<CowArray<'_, f32, Ix1>>

Get the embedding of a word.

source

pub fn embedding_into(&self, word: &str, target: ArrayViewMut1<'_, f32>) -> bool

Realize the embedding of a word into the given vector.

This variant of embedding realizes the embedding into the given vector. This makes it possible to look up embeddings without any additional allocations. This method returns false and does not modify the vector if no embedding could be found.

Panics when then the vector does not have the same dimensionality as the word embeddings.

source

pub fn embedding_batch( &self, words: &[impl AsRef<str>] ) -> (Array2<f32>, Vec<bool>)

Get a batch of embeddings.

The embeddings of all words are computed and returned. This method also return a Vec indicating for each word if an embedding could be found.

source

pub fn embedding_batch_into( &self, words: &[impl AsRef<str>], output: ArrayViewMut2<'_, f32> ) -> Vec<bool>

Get a batch of embeddings.

The embeddings of all words are computed and written to output. A Vec is returned that indicates for each word if an embedding could be found.

This method panics when output does not have the correct shape.

source

pub fn embedding_with_norm(&self, word: &str) -> Option<EmbeddingWithNorm<'_>>

Get the embedding and original norm of a word.

Returns for a word:

  • The word embedding.
  • The norm of the embedding before normalization to a unit vector.

The original embedding can be reconstructed by multiplying all embedding components by the original norm.

If the model does not have associated norms, 1 will be returned as the norm for vocabulary words.

source

pub fn iter(&self) -> Iter<'_>

Get an iterator over pairs of words and the corresponding embeddings.

source

pub fn iter_with_norms(&self) -> IterWithNorms<'_>

Get an iterator over triples of words, embeddings, and norms.

Returns an iterator that returns triples of:

  • A word.
  • Its word embedding.
  • The original norm of the embedding before normalization to a unit vector.

The original embedding can be reconstructed by multiplying all embedding components by the original norm.

If the model does not have associated norms, the norm is always 1.

source

pub fn len(&self) -> usize

Get the vocabulary size.

The vocabulary size excludes subword units.

source§

impl<I, S> Embeddings<SubwordVocab<I>, S>where I: BucketIndexer, S: Storage + CloneFromMapping,

source

pub fn to_explicit(&self) -> Result<Embeddings<ExplicitSubwordVocab, S::Result>>

Convert to explicitly indexed subword Embeddings.

source§

impl<S> Embeddings<VocabWrap, S>where S: Storage + CloneFromMapping,

source

pub fn try_to_explicit( &self ) -> Result<Embeddings<ExplicitSubwordVocab, S::Result>>

Try to convert to explicitly indexed subword embeddings.

Conversion fails if the wrapped vocabulary is SimpleVocab, FloretSubwordVocab or already an ExplicitSubwordVocab.

Trait Implementations§

source§

impl<V, S> Analogy for Embeddings<V, S>where V: Vocab, S: StorageView,

source§

fn analogy_masked( &self, query: [&str; 3], remove: [bool; 3], limit: usize, batch_size: Option<usize> ) -> Result<Vec<WordSimilarityResult<'_>>, [bool; 3]>

Perform an analogy query. Read more
source§

fn analogy( &self, query: [&str; 3], limit: usize, batch_size: Option<usize> ) -> Result<Vec<WordSimilarityResult<'_>>, [bool; 3]>

Perform an analogy query. Read more
source§

impl<V: Clone, S: Clone> Clone for Embeddings<V, S>

source§

fn clone(&self) -> Embeddings<V, S>

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl<V: Debug, S: Debug> Debug for Embeddings<V, S>

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl<V, S> EmbeddingSimilarity for Embeddings<V, S>where V: Vocab, S: StorageView,

source§

fn embedding_similarity_masked( &self, query: ArrayView1<'_, f32>, limit: usize, skip: &HashSet<&str>, batch_size: Option<usize> ) -> Option<Vec<WordSimilarityResult<'_>>>

Find words that are similar to the query embedding while skipping certain words. Read more
source§

fn embedding_similarity( &self, query: ArrayView1<'_, f32>, limit: usize, batch_size: Option<usize> ) -> Option<Vec<WordSimilarityResult<'_>>>

Find words that are similar to the query embedding. Read more
source§

impl From<Embeddings<SimpleVocab, MmapArray>> for Embeddings<VocabWrap, StorageViewWrap>

source§

fn from(from: Embeddings<SimpleVocab, MmapArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SimpleVocab, MmapArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<SimpleVocab, MmapArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SimpleVocab, MmapQuantizedArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<SimpleVocab, MmapQuantizedArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SimpleVocab, NdArray>> for Embeddings<VocabWrap, StorageViewWrap>

source§

fn from(from: Embeddings<SimpleVocab, NdArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SimpleVocab, NdArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<SimpleVocab, NdArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SimpleVocab, QuantizedArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<SimpleVocab, QuantizedArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<ExplicitIndexer>, MmapArray>> for Embeddings<VocabWrap, StorageViewWrap>

source§

fn from(from: Embeddings<ExplicitSubwordVocab, MmapArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<ExplicitIndexer>, MmapArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<ExplicitSubwordVocab, MmapArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<ExplicitIndexer>, MmapQuantizedArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<ExplicitSubwordVocab, MmapQuantizedArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<ExplicitIndexer>, NdArray>> for Embeddings<VocabWrap, StorageViewWrap>

source§

fn from(from: Embeddings<ExplicitSubwordVocab, NdArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<ExplicitIndexer>, NdArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<ExplicitSubwordVocab, NdArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<ExplicitIndexer>, QuantizedArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<ExplicitSubwordVocab, QuantizedArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<FastTextIndexer>, MmapArray>> for Embeddings<VocabWrap, StorageViewWrap>

source§

fn from(from: Embeddings<FastTextSubwordVocab, MmapArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<FastTextIndexer>, MmapArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<FastTextSubwordVocab, MmapArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<FastTextIndexer>, MmapQuantizedArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<FastTextSubwordVocab, MmapQuantizedArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<FastTextIndexer>, NdArray>> for Embeddings<VocabWrap, StorageViewWrap>

source§

fn from(from: Embeddings<FastTextSubwordVocab, NdArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<FastTextIndexer>, NdArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<FastTextSubwordVocab, NdArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<FastTextIndexer>, QuantizedArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<FastTextSubwordVocab, QuantizedArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<FloretIndexer>, MmapArray>> for Embeddings<VocabWrap, StorageViewWrap>

source§

fn from(from: Embeddings<FloretSubwordVocab, MmapArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<FloretIndexer>, MmapArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<FloretSubwordVocab, MmapArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<FloretIndexer>, MmapQuantizedArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<FloretSubwordVocab, MmapQuantizedArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<FloretIndexer>, NdArray>> for Embeddings<VocabWrap, StorageViewWrap>

source§

fn from(from: Embeddings<FloretSubwordVocab, NdArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<FloretIndexer>, NdArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<FloretSubwordVocab, NdArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<FloretIndexer>, QuantizedArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<FloretSubwordVocab, QuantizedArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<HashIndexer<FnvHasher>>, MmapArray>> for Embeddings<VocabWrap, StorageViewWrap>

source§

fn from(from: Embeddings<BucketSubwordVocab, MmapArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<HashIndexer<FnvHasher>>, MmapArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<BucketSubwordVocab, MmapArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<HashIndexer<FnvHasher>>, MmapQuantizedArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<BucketSubwordVocab, MmapQuantizedArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<HashIndexer<FnvHasher>>, NdArray>> for Embeddings<VocabWrap, StorageViewWrap>

source§

fn from(from: Embeddings<BucketSubwordVocab, NdArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<HashIndexer<FnvHasher>>, NdArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<BucketSubwordVocab, NdArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<SubwordVocab<HashIndexer<FnvHasher>>, QuantizedArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<BucketSubwordVocab, QuantizedArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<VocabWrap, MmapQuantizedArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<VocabWrap, MmapQuantizedArray>) -> Self

Converts to this type from the input type.
source§

impl From<Embeddings<VocabWrap, QuantizedArray>> for Embeddings<VocabWrap, StorageWrap>

source§

fn from(from: Embeddings<VocabWrap, QuantizedArray>) -> Self

Converts to this type from the input type.
source§

impl<'a, V, S> IntoIterator for &'a Embeddings<V, S>where V: Vocab, S: Storage,

§

type Item = (&'a str, ArrayBase<CowRepr<'a, f32>, Dim<[usize; 1]>>)

The type of the elements being iterated over.
§

type IntoIter = Iter<'a>

Which kind of iterator are we turning this into?
source§

fn into_iter(self) -> Self::IntoIter

Creates an iterator from a value. Read more
source§

impl<V, S> MmapEmbeddings for Embeddings<V, S>where Self: Sized, V: ReadChunk, S: MmapChunk,

source§

fn mmap_embeddings(read: &mut BufReader<File>) -> Result<Self>

source§

impl<V, S> Quantize<V> for Embeddings<V, S>where V: Vocab + Clone, S: StorageView,

source§

fn quantize_using<T, R>( &self, n_subquantizers: usize, n_subquantizer_bits: u32, n_iterations: usize, n_attempts: usize, normalize: bool, rng: R ) -> Result<Embeddings<V, QuantizedArray>>where T: TrainPq<f32>, R: CryptoRng + RngCore + SeedableRng + Send,

Quantize the embedding matrix using the provided RNG. Read more
source§

fn quantize<T>( &self, n_subquantizers: usize, n_subquantizer_bits: u32, n_iterations: usize, n_attempts: usize, normalize: bool ) -> Result<Embeddings<V, QuantizedArray>>where T: TrainPq<f32>,

Quantize the embedding matrix. Read more
source§

impl<V, S> ReadEmbeddings for Embeddings<V, S>where V: ReadChunk, S: ReadChunk,

source§

fn read_embeddings<R>(read: &mut R) -> Result<Self>where R: Read + Seek,

Read the embeddings.
source§

impl ReadFastText for Embeddings<FastTextSubwordVocab, NdArray>

source§

fn read_fasttext(reader: &mut impl BufRead) -> Result<Self>

Read embeddings in the fastText format.
source§

fn read_fasttext_lossy(reader: &mut impl BufRead) -> Result<Self>

Read embeddings in the fastText format lossily. Read more
source§

impl ReadFloretText for Embeddings<FloretSubwordVocab, NdArray>

source§

fn read_floret_text(reader: &mut impl BufRead) -> Result<Self>

Read embeddings in the floret format.
source§

impl<R> ReadText<R> for Embeddings<SimpleVocab, NdArray>where R: BufRead,

source§

fn read_text(reader: &mut R) -> Result<Self>

Read the embeddings from the given buffered reader.
source§

fn read_text_lossy(reader: &mut R) -> Result<Self>

Read the embeddings from the given buffered reader. Read more
source§

impl<R> ReadTextDims<R> for Embeddings<SimpleVocab, NdArray>where R: BufRead,

source§

fn read_text_dims(reader: &mut R) -> Result<Self>

Read the embeddings from the given buffered reader.
source§

fn read_text_dims_lossy(reader: &mut R) -> Result<Self>

Read the embeddings from the given buffered reader. Read more
source§

impl<R> ReadWord2Vec<R> for Embeddings<SimpleVocab, NdArray>where R: BufRead,

source§

fn read_word2vec_binary(reader: &mut R) -> Result<Self>

Read the embeddings from the given buffered reader.
source§

fn read_word2vec_binary_lossy(reader: &mut R) -> Result<Self>

Read the embeddings from the given buffered reader. Read more
source§

impl<V, S> WordSimilarity for Embeddings<V, S>where V: Vocab, S: StorageView,

source§

fn word_similarity( &self, word: &str, limit: usize, batch_size: Option<usize> ) -> Option<Vec<WordSimilarityResult<'_>>>

Find words that are similar to the query word. Read more
source§

impl<V, S> WriteEmbeddings for Embeddings<V, S>where V: WriteChunk, S: WriteChunk,

source§

fn write_embeddings<W>(&self, write: &mut W) -> Result<()>where W: Write + Seek,

source§

fn write_embeddings_len(&self, offset: u64) -> u64

source§

impl<W, S> WriteFastText<W> for Embeddings<FastTextSubwordVocab, S>where W: Write, S: Storage,

source§

fn write_fasttext(&self, write: &mut W) -> Result<()>

Write the embeddings to the given writer in fastText format. Read more
source§

impl WriteFloretText for Embeddings<FloretSubwordVocab, NdArray>

source§

fn write_floret_text(&self, write: &mut dyn Write) -> Result<()>

Read embeddings in the floret format.
source§

impl<W, V, S> WriteText<W> for Embeddings<V, S>where W: Write, V: Vocab, S: Storage,

source§

fn write_text(&self, write: &mut W, unnormalize: bool) -> Result<()>

Read the embeddings from the given buffered reader. Read more
source§

impl<W, V, S> WriteTextDims<W> for Embeddings<V, S>where W: Write, V: Vocab, S: Storage,

source§

fn write_text_dims(&self, write: &mut W, unnormalize: bool) -> Result<()>

Write the embeddings to the given writer. Read more
source§

impl<W, V, S> WriteWord2Vec<W> for Embeddings<V, S>where W: Write, V: Vocab, S: Storage,

source§

fn write_word2vec_binary(&self, w: &mut W, unnormalize: bool) -> Result<()>where W: Write,

Write the embeddings from the given writer. Read more

Auto Trait Implementations§

§

impl<V, S> RefUnwindSafe for Embeddings<V, S>where S: RefUnwindSafe, V: RefUnwindSafe,

§

impl<V, S> Send for Embeddings<V, S>where S: Send, V: Send,

§

impl<V, S> Sync for Embeddings<V, S>where S: Sync, V: Sync,

§

impl<V, S> Unpin for Embeddings<V, S>where S: Unpin, V: Unpin,

§

impl<V, S> UnwindSafe for Embeddings<V, S>where S: UnwindSafe, V: UnwindSafe,

Blanket Implementations§

source§

impl<T> Any for Twhere T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for Twhere T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for Twhere T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for Twhere U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

§

impl<T> Pointable for T

§

const ALIGN: usize = _

The alignment of pointer.
§

type Init = T

The type for initializers.
§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
source§

impl<T> ToOwned for Twhere T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for Twhere V: MultiLane<T>,

§

fn vzip(self) -> V