Cognee-Embedding
Multi-provider text embedding engine for Cognee-Rust. Supports local ONNX
inference (BGE-Small-v1.5) plus OpenAI-compatible and Ollama HTTP backends,
selected at runtime via EmbeddingConfig.
Providers
Selected via EmbeddingProvider (or the EMBEDDING_PROVIDER env var):
OnnxEmbeddingEngine(onnxfeature) — local ONNX Runtime inference viaort, with HuggingFace tokenizers; auto-downloads models from HuggingFace HubOpenAICompatibleEmbeddingEngine— OpenAI/Azure/vLLM/llama.cpp/TEI via HTTP (retry + input sanitization)OllamaEmbeddingEngine— Ollama/api/embedMockEmbeddingEngine— zero vectors for testing (MOCK_EMBEDDING=true)
The default provider is OpenAI text-embedding-3-small (1536-d) on host
platforms and local ONNX on Android (when the onnx feature is enabled).
Features
- ONNX Runtime: Efficient local inference via
ortcrate (behind theonnxfeature) - HuggingFace Tokenizers: Proper BPE/WordPiece tokenization matching Python fastembed
- Batch Processing: Process multiple texts in single inference call
- L2 Normalization: Unit vectors for cosine similarity
- Async API: Non-blocking via
spawn_blocking
Quick Start
From environment (Recommended)
EmbeddingConfig::from_env() reads the same env vars as the Python SDK and
create_engine() returns the appropriate provider as Arc<dyn EmbeddingEngine>:
use EmbeddingConfig;
async
Local ONNX with automatic download
With the onnx feature, OnnxEmbeddingEngine auto-downloads the model and
tokenizer from HuggingFace Hub if not found locally. It is configured with an
OnnxEmbeddingConfig:
use ;
async
Manual model placement (Advanced)
If you prefer to download models manually, use the synchronous constructor
OnnxEmbeddingEngine::new(config) instead of with_auto_download. It expects
the files referenced by the config to already exist:
- Model:
./target/models/BGE-Small-v1.5-model_quantized.onnx - Tokenizer:
./target/models/bge-small-tokenizer.json
Models Supported
BGE-Small-v1.5 (default)
- Model: BAAI/bge-small-en-v1.5
- Dimensions: 384
- Size: ~90MB (quantized)
- Tokenizer:
BAAI/bge-small-en-v1.5 - Max sequence: 512 tokens
let config = bge_small;
all-MiniLM-L6-v2
- Model: sentence-transformers/all-MiniLM-L6-v2
- Dimensions: 384
- Size: ~22MB (quantized)
- Tokenizer:
sentence-transformers/all-MiniLM-L6-v2 - Max sequence: 256 tokens
let config = minilm_l6;
Download API
For advanced use cases, you can use the download utilities directly:
use ;
use Path;
async
Supported model names: "bge-small-en-v1.5", "all-MiniLM-L6-v2"
Running Examples
# Basic usage example (downloads the BGE-Small model on first run)
Running Tests
# Unit tests (no model required)
# Integration tests (requires model + tokenizer)
API Reference
EmbeddingEngine Trait
Configuration
EmbeddingConfig is the provider-agnostic top-level config (use
EmbeddingConfig::from_env() or EmbeddingConfig::default()):
OnnxEmbeddingConfig (behind the onnx feature) holds the ONNX-only fields:
Environment variables
EmbeddingConfig::from_env() reads (Python-SDK-compatible names):
EMBEDDING_PROVIDER, MOCK_EMBEDDING, EMBEDDING_MODEL, EMBEDDING_DIMENSIONS,
EMBEDDING_ENDPOINT, EMBEDDING_API_KEY (fallback LLM_API_KEY),
EMBEDDING_API_VERSION, EMBEDDING_MAX_COMPLETION_TOKENS, EMBEDDING_BATCH_SIZE,
HUGGINGFACE_TOKENIZER.
Architecture
The implementation follows these key patterns:
- HuggingFace Tokenization: Uses
tokenizerscrate to load tokenizer.json files, ensuring exact match with Python fastembed - ONNX Inference: Runs model via
ortcrate with Level3 graph optimization - Mean Pooling: Averages token embeddings (respecting attention mask) over sequence dimension
- L2 Normalization: All output vectors normalized to unit length
- Async Wrapper: Synchronous ONNX calls wrapped in
tokio::task::spawn_blocking - Thread Safety: Session and tokenizer wrapped in
Arc<Mutex<T>>
Python Parity
This implementation matches Python's FastembedEmbeddingEngine by:
- Using the same HuggingFace tokenizers (exact token IDs)
- Same ONNX models from HuggingFace Hub
- Same pooling and normalization strategies
- Results should match within floating-point precision (< 0.01 cosine distance)
Troubleshooting
"Model file not found"
Download the model first:
"Failed to load tokenizer" / "Tokenizer.json not found"
Use OnnxEmbeddingEngine::with_auto_download(...) (or the example above) to
fetch the model and tokenizer from HuggingFace Hub automatically. If you place
files manually, the tokenizer must be at:
./target/models/bge-small-tokenizer.json
License
Dual-licensed under MIT or Apache-2.0, at your option.