Features
- Supports synchronous usage. No dependency on Tokio.
- Uses @pykeio/ort for performant ONNX inference.
- Uses @huggingface/tokenizers for fast encodings.
Not looking for Rust?
- Python: fastembed
- Go: fastembed-go
- JavaScript: fastembed-js
Supported Models
Text Embedding
- BAAI/bge-small-en-v1.5 - Default
- BAAI/bge-base-en-v1.5
- BAAI/bge-large-en-v1.5
- BAAI/bge-small-zh-v1.5
- BAAI/bge-large-zh-v1.5
- BAAI/bge-m3
- sentence-transformers/all-MiniLM-L6-v2
- sentence-transformers/all-MiniLM-L12-v2
- sentence-transformers/all-mpnet-base-v2
- sentence-transformers/paraphrase-MiniLM-L12-v2
- sentence-transformers/paraphrase-multilingual-mpnet-base-v2
- nomic-ai/nomic-embed-text-v1
- nomic-ai/nomic-embed-text-v1.5 - pairs with
nomic-embed-vision-v1.5for image-to-text search - intfloat/multilingual-e5-small
- intfloat/multilingual-e5-base
- intfloat/multilingual-e5-large
- mixedbread-ai/mxbai-embed-large-v1
- Alibaba-NLP/gte-base-en-v1.5
- Alibaba-NLP/gte-large-en-v1.5
- lightonai/ModernBERT-embed-large
- Qdrant/clip-ViT-B-32-text - pairs with
clip-ViT-B-32-visionfor image-to-text search - jinaai/jina-embeddings-v2-base-code
- jinaai/jina-embeddings-v2-base-en
- google/embeddinggemma-300m
- nomic-ai/nomic-embed-text-v2-moe - requires
nomic-v2-moefeature (candle backend) - Qwen/Qwen3-Embedding-0.6B - requires
qwen3feature (candle backend) - Qwen/Qwen3-Embedding-4B - requires
qwen3feature (candle backend) - Qwen/Qwen3-Embedding-8B - requires
qwen3feature (candle backend) - Qwen/Qwen3-VL-Embedding-2B - requires
qwen3feature (candle backend, multimodal viaQwen3VLEmbedding) - snowflake/snowflake-arctic-embed-xs
- snowflake/snowflake-arctic-embed-s
- snowflake/snowflake-arctic-embed-m
- snowflake/snowflake-arctic-embed-m-long
- snowflake/snowflake-arctic-embed-l
Quantized versions are also available for several models above (append Q to the model enum variant, e.g., EmbeddingModel::BGESmallENV15Q).
Sparse Text Embedding
- prithivida/Splade_PP_en_v1 - Default
- BAAI/bge-m3
Image Embedding
- Qdrant/clip-ViT-B-32-vision - Default
- Qdrant/resnet50-onnx
- Qdrant/Unicom-ViT-B-16
- Qdrant/Unicom-ViT-B-32
- nomic-ai/nomic-embed-vision-v1.5
Reranking
- BAAI/bge-reranker-base - Default
- BAAI/bge-reranker-v2-m3
- jinaai/jina-reranker-v1-turbo-en
- jinaai/jina-reranker-v2-base-multiligual
✊ Support
To support the library, please donate to our primary upstream dependency, ort - The Rust wrapper for the ONNX runtime.
Installation
Run the following in your project directory:
Or add the following line to your Cargo.toml:
[]
= "5"
Usage
Text Embeddings
use ;
// With default options
let mut model = try_new?;
// With custom options
let mut model = try_new?;
let documents = vec!;
// Generate embeddings with the default batch size, 256
let embeddings = model.embed?;
println!; // -> Embeddings length: 4
println!; // -> Embedding dimension: 384
Qwen3 Embeddings
Qwen3 embedding models are available behind the qwen3 feature flag (candle backend).
[]
= { = "5", = ["qwen3"] }
use ;
use Qwen3TextEmbedding;
let device = Cpu;
let model = from_hf?;
// Text-only usage with the Qwen3-VL embedding checkpoint is also supported:
// let model = Qwen3TextEmbedding::from_hf("Qwen/Qwen3-VL-Embedding-2B", &device, DType::F32, 512)?;
let embeddings = model.embed?;
println!;
For multimodal text/image usage with Qwen/Qwen3-VL-Embedding-2B:
use ;
use Qwen3VLEmbedding;
let device = Cpu;
let model = from_hf?;
let image_embeddings = model.embed_images?;
let text_embeddings = model.embed_texts?;
println!;
println!;
Nomic Embed Text v2 MoE
The nomic-embed-text-v2-moe model is available behind the nomic-v2-moe feature flag (candle backend). First general-purpose MoE embedding model with 100+ language support.
[]
= { = "5", = ["nomic-v2-moe"] }
use ;
use NomicV2MoeTextEmbedding;
let device = Cpu;
let model = from_hf?;
let embeddings = model.embed?;
println!;
Sparse Text Embeddings
use ;
// With default options
let mut model = try_new?;
// With custom options
let mut model = try_new?;
let documents = vec!;
// Generate embeddings with the default batch size, 256
let embeddings: = model.embed?;
BGE-M3 Joint Embeddings
The BGE-M3 model produces dense, sparse, and ColBERT embeddings simultaneously in a single forward pass.
[!WARNING] The default quantized model (
BGEM3Q) is optimized for CPUs; passing a GPU execution provider (like CUDA) will fail. For GPU inference or custom requirements, you can export your own custom model (FP32, FP16, or INT8) using the ONNX export script from hfgpahal/bge-m3-onnx-int8and load it viatry_new_from_path.
use ;
// With default options
let mut model = try_new?;
// With custom options (supporting custom max length up to 8192 tokens)
let mut model = try_new?;
let documents = vec!;
// Generate all three representations in a single forward pass
let output = model.embed?;
println!; // -> Dense dimension: 1024
let sparse_emb = &output.sparse;
println!;
println!;
Alternatively, local model files can be loaded via try_new_from_user_defined (for inline buffer ONNX models) or try_new_from_path (supporting split external ONNX data files like model.onnx_data):
use ;
let user_model = new;
let mut model = try_new_from_user_defined?;
Image Embeddings
use ;
// With default options
let mut model = try_new?;
// With custom options
let mut model = try_new?;
let images = vec!;
// Generate embeddings with the default batch size, 256
let embeddings = model.embed?;
println!; // -> Embeddings length: 2
println!; // -> Embedding dimension: 512
Candidates Reranking
use ;
// With default options
let mut model = try_new?;
// With custom options
let mut model = try_new?;
let documents = vec!;
// Rerank with the default batch size, 256 and return document contents
let results = model.rerank?;
println!;
Alternatively, local model files can be used for inference via the try_new_from_user_defined(...) methods of respective structs.
DirectML (Windows)
To run models on a GPU via DirectML on Windows, enable the directml feature:
[]
= { = "5", = ["directml"] }
Then pass a DirectML execution provider when initializing a model:
use ;
use DirectML;
let model = try_new?;
When DirectML is detected, fastembed automatically disables memory pattern optimization and parallel execution on the ONNX Runtime session, as required by the DirectML execution provider.