Expand description
§Dakera Inference Engine
Embedded inference engine for generating vector embeddings locally without external API calls. This crate provides:
- Local Embedding Generation: Generate embeddings using state-of-the-art transformer models running locally on CPU or GPU.
- Multiple Model Support: Choose from MiniLM (fast), BGE (balanced), or E5 (quality).
- Batch Processing: Efficient batch processing with automatic batching and parallelization.
- Zero External Dependencies: No OpenAI, Cohere, or other API keys required.
§Quick Start
use inference::{EmbeddingEngine, ModelConfig, EmbeddingModel};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create engine with default model (MiniLM)
let engine = EmbeddingEngine::new(ModelConfig::default()).await?;
// Embed a query
let query_embedding = engine.embed_query("What is machine learning?").await?;
println!("Query embedding: {} dimensions", query_embedding.len());
// Embed documents
let docs = vec![
"Machine learning is a type of artificial intelligence.".to_string(),
"Deep learning uses neural networks with many layers.".to_string(),
];
let doc_embeddings = engine.embed_documents(&docs).await?;
println!("Generated {} document embeddings", doc_embeddings.len());
Ok(())
}§Model Selection
Choose the right model for your use case:
| Model | Speed | Quality | Use Case |
|---|---|---|---|
| MiniLM | ⚡⚡⚡ | ⭐⭐ | High-throughput, real-time |
| BGE-small | ⚡⚡ | ⭐⭐⭐ | Balanced performance |
| E5-small | ⚡⚡ | ⭐⭐⭐ | Best quality for retrieval |
§GPU Acceleration
Enable GPU acceleration by building with the appropriate feature:
# For NVIDIA GPUs
inference = { path = "crates/inference", features = ["cuda"] }
# For Apple Silicon
inference = { path = "crates/inference", features = ["metal"] }§Architecture
┌─────────────────────────────────────────────────────────────┐
│ EmbeddingEngine │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ ModelConfig │ │ BatchProcessor│ │ BertModel │ │
│ │ - model │ │ - tokenizer │ │ (Candle) │ │
│ │ - device │ │ - batching │ │ - forward() │ │
│ │ - batch_sz │ │ - prefixes │ │ - mean_pool() │ │
│ └─────────────┘ └──────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────┐
│ Vec<f32> Embeddings │
│ (normalized, 384 dims) │
└───────────────────────────────┘Re-exports§
pub use engine::EmbeddingEngine;pub use engine::EmbeddingEngineBuilder;pub use error::InferenceError;pub use error::Result;pub use models::EmbeddingModel;pub use models::ModelConfig;