Dakera Inference Engine
Embedded inference engine for generating vector embeddings locally without external API calls. This crate provides:
- Local Embedding Generation: Generate embeddings using state-of-the-art transformer models running locally on CPU or GPU.
- Multiple Model Support: Choose from MiniLM (fast), BGE (balanced), or E5 (quality).
- Batch Processing: Efficient batch processing with automatic batching and parallelization.
- Zero External Dependencies: No OpenAI, Cohere, or other API keys required.
Quick Start
use inference::{EmbeddingEngine, ModelConfig, EmbeddingModel};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create engine with default model (MiniLM)
let engine = EmbeddingEngine::new(ModelConfig::default()).await?;
// Embed a query
let query_embedding = engine.embed_query("What is machine learning?").await?;
println!("Query embedding: {} dimensions", query_embedding.len());
// Embed documents
let docs = vec![
"Machine learning is a type of artificial intelligence.".to_string(),
"Deep learning uses neural networks with many layers.".to_string(),
];
let doc_embeddings = engine.embed_documents(&docs).await?;
println!("Generated {} document embeddings", doc_embeddings.len());
Ok(())
}
Model Selection
Choose the right model for your use case:
| Model | Speed | Quality | Use Case |
|---|---|---|---|
| MiniLM | ⚡⚡⚡ | ⭐⭐ | High-throughput, real-time |
| BGE-small | ⚡⚡ | ⭐⭐⭐ | Balanced performance |
| E5-small | ⚡⚡ | ⭐⭐⭐ | Best quality for retrieval |
GPU Acceleration
Enable GPU acceleration by building with the appropriate feature:
# For NVIDIA GPUs
= { = "crates/inference", = ["cuda"] }
# For Apple Silicon
= { = "crates/inference", = ["metal"] }
Architecture
┌─────────────────────────────────────────────────────────────┐
│ EmbeddingEngine │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ ModelConfig │ │ BatchProcessor│ │ ort::Session │ │
│ │ - model │ │ - tokenizer │ │ (ONNX Runtime) │ │
│ │ - threads │ │ - batching │ │ - BERT INT8 │ │
│ │ - batch_sz │ │ - prefixes │ │ - mean_pool() │ │
│ └─────────────┘ └──────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────┐
│ Vec<f32> Embeddings │
│ (normalized, 384 dims) │
└───────────────────────────────┘