Struct rust2vec::Embeddings
[−]
[src]
pub struct Embeddings { /* fields omitted */ }
Word embeddings.
This data structure stores word embeddings (also known as word vectors) and provides some useful methods on the embeddings, such as similarity and analogy queries.
Methods
impl Embeddings
[src]
fn analogy(
&self,
word1: &str,
word2: &str,
word3: &str,
limit: usize
) -> Option<Vec<WordSimilarity>>
&self,
word1: &str,
word2: &str,
word3: &str,
limit: usize
) -> Option<Vec<WordSimilarity>>
Perform an analogy query.
This method returns words that are close in vector space the analogy
query word1
is to word2
as word3
is to ?
. More concretely,
it searches embeddings that are similar to:
embedding(word2) - embedding(word1) + embedding(word3)
At most, limit
results are returned.
fn analogy_by<F>(
&self,
word1: &str,
word2: &str,
word3: &str,
limit: usize,
similarity: F
) -> Option<Vec<WordSimilarity>> where
F: FnMut(ArrayView2<f32>, ArrayView1<f32>) -> Array1<f32>,
&self,
word1: &str,
word2: &str,
word3: &str,
limit: usize,
similarity: F
) -> Option<Vec<WordSimilarity>> where
F: FnMut(ArrayView2<f32>, ArrayView1<f32>) -> Array1<f32>,
Perform an analogy query using the given similarity function.
This method returns words that are close in vector space the analogy
query word1
is to word2
as word3
is to ?
. More concretely,
it searches embeddings that are similar to:
embedding(word2) - embedding(word1) + embedding(word3)
At most, limit
results are returned.
fn data(&self) -> ArrayView2<f32>
Get (a view of) the raw embedding matrix.
fn embed_len(&self) -> usize
Return the length (in vector components) of the word embeddings.
fn embedding(&self, word: &str) -> Option<ArrayView1<f32>>
Get the embedding of a word.
fn iter(&self) -> Iter
Get an iterator over pairs of words and the corresponding embeddings.
fn normalize(&mut self)
Normalize the embeddings using their L2 (euclidean) norms.
Note: when you are using the output of e.g. word2vec, you should normalize the embeddings to get good query results.
fn similarity(&self, word: &str, limit: usize) -> Option<Vec<WordSimilarity>>
Find words that are similar to the query word.
The similarity between two words is defined by the dot product of
the embeddings. If the vectors are unit vectors (e.g. by virtue of
calling normalize
), this is the cosine similarity. At most, limit
results are returned.
fn similarity_by<F>(
&self,
word: &str,
limit: usize,
similarity: F
) -> Option<Vec<WordSimilarity>> where
F: FnMut(ArrayView2<f32>, ArrayView1<f32>) -> Array1<f32>,
&self,
word: &str,
limit: usize,
similarity: F
) -> Option<Vec<WordSimilarity>> where
F: FnMut(ArrayView2<f32>, ArrayView1<f32>) -> Array1<f32>,
Find words that are similar to the query word using the given similarity function.
The similarity function should return, given the embeddings matrix and
the word vector a vector of similarity scores. At most, limit
results
are returned.
fn len(&self) -> usize
Get the number of words for which embeddings are stored.
fn words(&self) -> &[String]
Get the words for which embeddings are stored. The words line up with
the rows in the matrix returned by data
.
Trait Implementations
impl<R> ReadText<R> for Embeddings where
R: BufRead + Seek,
[src]
R: BufRead + Seek,
fn read_text(reader: &mut R) -> Result<Embeddings>
Read the embeddings from the given buffered reader.
impl<W> WriteText<W> for Embeddings where
W: Write,
[src]
W: Write,
fn write_text(&self, write: &mut W) -> Result<()>
Read the embeddings from the given buffered reader.
impl<R> ReadWord2Vec<R> for Embeddings where
R: BufRead,
[src]
R: BufRead,
fn read_word2vec_binary(reader: &mut R) -> Result<Embeddings>
Read the embeddings from the given buffered reader.
impl<W> WriteWord2Vec<W> for Embeddings where
W: Write,
[src]
W: Write,
fn write_word2vec_binary(&self, w: &mut W) -> Result<()> where
W: Write,
W: Write,
Write the embeddings from the given writer.