Struct rust2vec::Embeddings
source · pub struct Embeddings { /* private fields */ }
Expand description
Word embeddings.
This data structure stores word embeddings (also known as word vectors) and provides some useful methods on the embeddings, such as similarity and analogy queries.
Implementations
sourceimpl Embeddings
impl Embeddings
sourcepub fn analogy(
&self,
word1: &str,
word2: &str,
word3: &str,
limit: usize
) -> Option<Vec<WordSimilarity<'_>>>
pub fn analogy(
&self,
word1: &str,
word2: &str,
word3: &str,
limit: usize
) -> Option<Vec<WordSimilarity<'_>>>
Perform an analogy query.
This method returns words that are close in vector space the analogy
query word1
is to word2
as word3
is to ?
. More concretely,
it searches embeddings that are similar to:
embedding(word2) - embedding(word1) + embedding(word3)
At most, limit
results are returned.
sourcepub fn analogy_by<F>(
&self,
word1: &str,
word2: &str,
word3: &str,
limit: usize,
similarity: F
) -> Option<Vec<WordSimilarity<'_>>>where
F: FnMut(ArrayView2<'_, f32>, ArrayView1<'_, f32>) -> Array1<f32>,
pub fn analogy_by<F>(
&self,
word1: &str,
word2: &str,
word3: &str,
limit: usize,
similarity: F
) -> Option<Vec<WordSimilarity<'_>>>where
F: FnMut(ArrayView2<'_, f32>, ArrayView1<'_, f32>) -> Array1<f32>,
Perform an analogy query using the given similarity function.
This method returns words that are close in vector space the analogy
query word1
is to word2
as word3
is to ?
. More concretely,
it searches embeddings that are similar to:
embedding(word2) - embedding(word1) + embedding(word3)
At most, limit
results are returned.
sourcepub fn data(&self) -> ArrayView2<'_, f32>
pub fn data(&self) -> ArrayView2<'_, f32>
Get (a view of) the raw embedding matrix.
sourcepub fn embed_len(&self) -> usize
pub fn embed_len(&self) -> usize
Return the length (in vector components) of the word embeddings.
sourcepub fn embedding(&self, word: &str) -> Option<ArrayView1<'_, f32>>
pub fn embedding(&self, word: &str) -> Option<ArrayView1<'_, f32>>
Get the embedding of a word.
sourcepub fn indices(&self) -> &HashMap<String, usize>
pub fn indices(&self) -> &HashMap<String, usize>
Get the mapping from words to row indices of the embedding matrix.
sourcepub fn iter(&self) -> Iter<'_>ⓘNotable traits for Iter<'a>impl<'a> Iterator for Iter<'a> type Item = (&'a str, ArrayView1<'a, f32>);
pub fn iter(&self) -> Iter<'_>ⓘNotable traits for Iter<'a>impl<'a> Iterator for Iter<'a> type Item = (&'a str, ArrayView1<'a, f32>);
Get an iterator over pairs of words and the corresponding embeddings.
sourcepub fn normalize(&mut self)
pub fn normalize(&mut self)
Normalize the embeddings using their L2 (euclidean) norms.
Note: when you are using the output of e.g. word2vec, you should normalize the embeddings to get good query results.
sourcepub fn similarity(
&self,
word: &str,
limit: usize
) -> Option<Vec<WordSimilarity<'_>>>
pub fn similarity(
&self,
word: &str,
limit: usize
) -> Option<Vec<WordSimilarity<'_>>>
Find words that are similar to the query word.
The similarity between two words is defined by the dot product of
the embeddings. If the vectors are unit vectors (e.g. by virtue of
calling normalize
), this is the cosine similarity. At most, limit
results are returned.
sourcepub fn similarity_by<F>(
&self,
word: &str,
limit: usize,
similarity: F
) -> Option<Vec<WordSimilarity<'_>>>where
F: FnMut(ArrayView2<'_, f32>, ArrayView1<'_, f32>) -> Array1<f32>,
pub fn similarity_by<F>(
&self,
word: &str,
limit: usize,
similarity: F
) -> Option<Vec<WordSimilarity<'_>>>where
F: FnMut(ArrayView2<'_, f32>, ArrayView1<'_, f32>) -> Array1<f32>,
Find words that are similar to the query word using the given similarity function.
The similarity function should return, given the embeddings matrix and
the word vector a vector of similarity scores. At most, limit
results
are returned.