Skip to main content

Crate inference

Crate inference 

Source
Expand description

§Dakera Inference Engine

Embedded inference engine for generating vector embeddings locally without external API calls. This crate provides:

  • Local Embedding Generation: Generate embeddings using state-of-the-art transformer models running locally on CPU or GPU.
  • Multiple Model Support: Choose from MiniLM (fast), BGE (balanced), or E5 (quality).
  • Batch Processing: Efficient batch processing with automatic batching and parallelization.
  • Zero External Dependencies: No OpenAI, Cohere, or other API keys required.

§Quick Start

use inference::{EmbeddingEngine, ModelConfig, EmbeddingModel};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create engine with default model (MiniLM)
    let engine = EmbeddingEngine::new(ModelConfig::default()).await?;

    // Embed a query
    let query_embedding = engine.embed_query("What is machine learning?").await?;
    println!("Query embedding: {} dimensions", query_embedding.len());

    // Embed documents
    let docs = vec![
        "Machine learning is a type of artificial intelligence.".to_string(),
        "Deep learning uses neural networks with many layers.".to_string(),
    ];
    let doc_embeddings = engine.embed_documents(&docs).await?;
    println!("Generated {} document embeddings", doc_embeddings.len());

    Ok(())
}

§Model Selection

Choose the right model for your use case:

ModelSpeedQualityUse Case
MiniLM⚡⚡⚡⭐⭐High-throughput, real-time
BGE-small⚡⚡⭐⭐⭐Balanced performance
E5-small⚡⚡⭐⭐⭐Best quality for retrieval

§GPU Acceleration

Enable GPU acceleration by building with the appropriate feature:

# For NVIDIA GPUs
inference = { path = "crates/inference", features = ["cuda"] }

# For Apple Silicon
inference = { path = "crates/inference", features = ["metal"] }

§Architecture

┌─────────────────────────────────────────────────────────────┐
│                    EmbeddingEngine                          │
│  ┌─────────────┐  ┌──────────────┐  ┌──────────────────┐   │
│  │ ModelConfig │  │ BatchProcessor│  │  ort::Session    │   │
│  │ - model     │  │ - tokenizer  │  │ (ONNX Runtime)   │   │
│  │ - threads   │  │ - batching   │  │ - BERT INT8      │   │
│  │ - batch_sz  │  │ - prefixes   │  │ - mean_pool()    │   │
│  └─────────────┘  └──────────────┘  └──────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                             │
                             ▼
             ┌───────────────────────────────┐
             │      Vec<f32> Embeddings      │
             │   (normalized, 384 dims)      │
             └───────────────────────────────┘

Re-exports§

pub use engine::EmbeddingEngine;
pub use engine::EmbeddingEngineBuilder;
pub use error::InferenceError;
pub use error::Result;
pub use extraction::build_provider;
pub use extraction::ExtractionOpts;
pub use extraction::ExtractionProvider;
pub use extraction::ExtractionResult;
pub use extraction::ExtractorConfig;
pub use models::EmbeddingModel;
pub use models::ModelConfig;
pub use ner::rule_based_extract;
pub use ner::ExtractedEntity;
pub use ner::GlinerEngine;
pub use ner::NerEngine;

Modules§

batch
Batch processing utilities for efficient embedding generation.
engine
Core embedding engine for generating vector embeddings from text.
error
Error types for the inference engine.
extraction
EXT-1 — External Extraction Providers.
models
Model configurations for supported embedding models.
ner
Named Entity Recognition (NER) engine — CE-4 GLiNER zero-shot NER.
prelude
Prelude module for convenient imports.