Skip to main content

Crate inference

Crate inference 

Source
Expand description

§Dakera Inference Engine

Embedded inference engine for generating vector embeddings locally without external API calls. This crate provides:

  • Local Embedding Generation: Generate embeddings using state-of-the-art transformer models running locally on CPU or GPU.
  • Multiple Model Support: Choose from MiniLM (fast), BGE (balanced), or E5 (quality).
  • Batch Processing: Efficient batch processing with automatic batching and parallelization.
  • Zero External Dependencies: No OpenAI, Cohere, or other API keys required.

§Quick Start

use inference::{EmbeddingEngine, ModelConfig, EmbeddingModel};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create engine with default model (MiniLM)
    let engine = EmbeddingEngine::new(ModelConfig::default()).await?;

    // Embed a query
    let query_embedding = engine.embed_query("What is machine learning?").await?;
    println!("Query embedding: {} dimensions", query_embedding.len());

    // Embed documents
    let docs = vec![
        "Machine learning is a type of artificial intelligence.".to_string(),
        "Deep learning uses neural networks with many layers.".to_string(),
    ];
    let doc_embeddings = engine.embed_documents(&docs).await?;
    println!("Generated {} document embeddings", doc_embeddings.len());

    Ok(())
}

§Model Selection

Choose the right model for your use case:

ModelSpeedQualityUse Case
MiniLM⚡⚡⚡⭐⭐High-throughput, real-time
BGE-small⚡⚡⭐⭐⭐Balanced performance
E5-small⚡⚡⭐⭐⭐Best quality for retrieval

§GPU Acceleration

Enable GPU acceleration by building with the appropriate feature:

# For NVIDIA GPUs
inference = { path = "crates/inference", features = ["cuda"] }

# For Apple Silicon
inference = { path = "crates/inference", features = ["metal"] }

§Architecture

┌─────────────────────────────────────────────────────────────┐
│                    EmbeddingEngine                          │
│  ┌─────────────┐  ┌──────────────┐  ┌──────────────────┐   │
│  │ ModelConfig │  │ BatchProcessor│  │   BertModel      │   │
│  │ - model     │  │ - tokenizer  │  │ (Candle)         │   │
│  │ - device    │  │ - batching   │  │ - forward()      │   │
│  │ - batch_sz  │  │ - prefixes   │  │ - mean_pool()    │   │
│  └─────────────┘  └──────────────┘  └──────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                             │
                             ▼
             ┌───────────────────────────────┐
             │      Vec<f32> Embeddings      │
             │   (normalized, 384 dims)      │
             └───────────────────────────────┘

Re-exports§

pub use engine::EmbeddingEngine;
pub use engine::EmbeddingEngineBuilder;
pub use error::InferenceError;
pub use error::Result;
pub use models::EmbeddingModel;
pub use models::ModelConfig;

Modules§

batch
Batch processing utilities for efficient embedding generation.
engine
Core embedding engine for generating vector embeddings from text.
error
Error types for the inference engine.
models
Model configurations for supported embedding models.
prelude
Prelude module for convenient imports.