Crate embedding

Expand description

Word embedding training library.

This crate provides tools for training word embeddings from scratch using SkipGram, CBOW, and other models. It supports:

Mini-batch training with gradient clipping and L2 regularization
Learning rate scheduling (constant, exponential, step, cosine)
Early stopping and evaluation metrics
Text preprocessing (HTML stripping, URL removal, contraction expansion)
Source code preprocessing (comment stripping, camelCase splitting)
BPE subword tokenization
Export to Word2Vec, NumPy, ONNX, and binary formats
Semantic search, analogy solving, and embedding arithmetic
Incremental vocabulary updates and LSH-based approximate nearest neighbors

§Example

use embedding::*;

let data = TrainingData::from_text("the cat sat on the mat");
let config = TrainingConfig::new(ModelType::SkipGram)
    .with_dim(8)
    .with_epochs(2);

let mut model = EmbeddingModel::new(config, data.vocab.len());
// model.train(&data).unwrap();

Re-exports§

pub use mmap::MmapEmbeddings;
pub use pretrained::PretrainedEmbeddings;
pub use pretrained::PretrainedLoader;
pub use config::*;
pub use evaluation::*;
pub use search::*;
pub use code::*;
pub use text::*;
pub use tokenizer::*;
pub use transfer::*;
pub use model::*;
pub use backend::*;
pub use benchmark::*;
pub use transformer::*;

Modules§

backend
benchmark
cli
code
config
evaluation
mmap
model
onnx: Low-level ONNX protobuf definitions generated by prost.
pretrained
search
text
tokenizer
transfer
transformer

Structs§

IncrementalTrainer: Supports real-time incremental training by updating a model with new sentences as they arrive, without requiring a full retrain.

Crate embedding

Crate embedding Copy item path

§Example

Re-exports§

Modules§

Structs§

Crate embedding