Crate finalfusion
source ·Expand description
A library for reading, writing, and using word embeddings.
finalfusion allows you to read, write, and use word2vec/GloVe embeddings and read fastText embeddings. finalfusion uses finalfusion as its native data format, which has several benefits over the word2vec, GloVe, and fastText formats.
Reading finalfusion embeddings
finalfusion embeddings can be read with the read_embeddings
method, which expects a reader that implements the BufRead
trait.
Since finalfusion supports various types of vocabularies and
embedding matrix (storage) formats, these should be specified
as type parameters of the Embeddings
type. However, typically
one would want to read finalfusion embeddings with any type of
vocabulary or embedding matrix. For this purpose, the VocabWrap
and StorageWrap
types are provided, which wrap any type of
vocabulary and embeddung matrix.
We can thus load a finalfusion format and retrieve an embedding as follows:
use std::fs::File;
use std::io::BufReader;
use finalfusion::prelude::*;
let mut reader = BufReader::new(File::open("testdata/similarity.fifu").unwrap());
// Read the embeddings.
let embeddings: Embeddings<VocabWrap, StorageWrap> =
Embeddings::read_embeddings(&mut reader)
.unwrap();
// Look up an embedding.
let embedding = embeddings.embedding("Berlin");
For performing analogy/similarity queries on the embedding
matrix, we need an embedding matrix which can act as a view.
In that case one should use StorageViewWrap
in place of
StorageWrap
. StorageViewWrap
is only supported for a
subset of embedding matrix types – in particular, quantized
matrices cannot be used as a view.
Reading other embedding formats
Consult the documentation of the fasttext
, text
and
word2vec
modules for information on how to read fastText,
GloVe, and word2vec embeddings.
Modules
- Readers/writers for other embedding formats.
- Word embeddings.
- Error/result types
- Traits and error types for I/O.
- Metadata chunks
- Norms chunk
- Prelude exports the most commonly-used types and traits.
- Traits and trait implementations for similarity queries.
- Embedding matrix representations.
- Utilities for subword units.
- Embedding vocabularies