Crate mecrab_word2vec

Crate mecrab_word2vec 

Source
Expand description

mecrab-word2vec: Pure Rust Word2Vec implementation

Fast, memory-efficient word2vec training optimized for Japanese morphological analysis.

§Features

  • Skip-gram with negative sampling
  • Multi-threaded training with Rayon
  • Direct MCV1 format output
  • Memory-efficient streaming

§Example

use mecrab_word2vec::Word2VecBuilder;

let model = Word2VecBuilder::new()
    .vector_size(100)
    .window_size(5)
    .negative_samples(5)
    .min_count(10)
    .epochs(3)
    .threads(8)
    .build()?;

model.train_from_file("corpus.txt")?;
model.save_text("vectors.txt")?;

Structs§

Vocabulary
Vocabulary with frequency counts and subsampling
Word2Vec
Word2Vec model
Word2VecBuilder
Builder for Word2Vec model

Enums§

Word2VecError

Type Aliases§

Result