finalfrontier 0.9.4

Train/use word embeddings with subword units
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Quickstart

Train a model with 300-dimensional word embeddings, the structured skip-gram
model, discarding words that occur fewer than 10 times:


    finalfrontier skipgram --dims 300 --model structgram --epochs 10 --mincount 10 \
      --threads 16 corpus.txt corpus-embeddings.fifu

The format of the input file is simple: tokens are separated by spaces,
sentences by newlines (`\n`).

After training, you can use and query the embeddings with
[finalfusion](https://github.com/finalfusion/finalfusion-rust) and
`finalfusion-utils`:

    finalfusion similar corpus-embeddings.fifu