Expand description
§Dakera Inference Engine
Embedded inference engine for generating vector embeddings locally without external API calls. This crate provides:
- Local Embedding Generation: Generate embeddings using state-of-the-art transformer models running locally on CPU or GPU.
- Multiple Model Support: Choose from MiniLM (fast), BGE (balanced), or E5 (quality).
- Batch Processing: Efficient batch processing with automatic batching and parallelization.
- Zero External Dependencies: No OpenAI, Cohere, or other API keys required.
§Quick Start
use inference::{EmbeddingEngine, ModelConfig, EmbeddingModel};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create engine with default model (MiniLM)
let engine = EmbeddingEngine::new(ModelConfig::default()).await?;
// Embed a query
let query_embedding = engine.embed_query("What is machine learning?").await?;
println!("Query embedding: {} dimensions", query_embedding.len());
// Embed documents
let docs = vec![
"Machine learning is a type of artificial intelligence.".to_string(),
"Deep learning uses neural networks with many layers.".to_string(),
];
let doc_embeddings = engine.embed_documents(&docs).await?;
println!("Generated {} document embeddings", doc_embeddings.len());
Ok(())
}§Model Selection
Choose the right model for your use case:
| Model | Speed | Quality | Use Case |
|---|---|---|---|
| MiniLM | ⚡⚡⚡ | ⭐⭐ | High-throughput, real-time |
| BGE-small | ⚡⚡ | ⭐⭐⭐ | Balanced performance |
| E5-small | ⚡⚡ | ⭐⭐⭐ | Best quality for retrieval |
§GPU Acceleration
Enable GPU acceleration by building with the appropriate feature:
# For NVIDIA GPUs
inference = { path = "crates/inference", features = ["cuda"] }
# For Apple Silicon
inference = { path = "crates/inference", features = ["metal"] }§Architecture
┌─────────────────────────────────────────────────────────────┐
│ EmbeddingEngine │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ ModelConfig │ │ BatchProcessor│ │ ort::Session │ │
│ │ - model │ │ - tokenizer │ │ (ONNX Runtime) │ │
│ │ - threads │ │ - batching │ │ - BERT INT8 │ │
│ │ - batch_sz │ │ - prefixes │ │ - mean_pool() │ │
│ └─────────────┘ └──────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────┐
│ Vec<f32> Embeddings │
│ (normalized, 384 dims) │
└───────────────────────────────┘Re-exports§
pub use engine::EmbeddingEngine;pub use engine::EmbeddingEngineBuilder;pub use error::InferenceError;pub use error::Result;pub use extraction::build_provider;pub use extraction::ExtractionOpts;pub use extraction::ExtractionProvider;pub use extraction::ExtractionResult;pub use extraction::ExtractorConfig;pub use models::EmbeddingModel;pub use models::ModelConfig;pub use ner::rule_based_extract;pub use ner::ExtractedEntity;pub use ner::GlinerEngine;pub use ner::NerEngine;
Modules§
- batch
- Batch processing utilities for efficient embedding generation.
- engine
- Core embedding engine for generating vector embeddings from text.
- error
- Error types for the inference engine.
- extraction
- EXT-1 — External Extraction Providers.
- models
- Model configurations for supported embedding models.
- ner
- Named Entity Recognition (NER) engine — CE-4 GLiNER zero-shot NER.
- prelude
- Prelude module for convenient imports.