dakera-inference 0.6.2

Embedded inference engine for Dakera - generates embeddings locally
Documentation
//! # Dakera Inference Engine
//!
//! Embedded inference engine for generating vector embeddings locally without
//! external API calls. This crate provides:
//!
//! - **Local Embedding Generation**: Generate embeddings using state-of-the-art
//!   transformer models running locally on CPU or GPU.
//! - **Multiple Model Support**: Choose from MiniLM (fast), BGE (balanced), or E5 (quality).
//! - **Batch Processing**: Efficient batch processing with automatic batching and parallelization.
//! - **Zero External Dependencies**: No OpenAI, Cohere, or other API keys required.
//!
//! ## Quick Start
//!
//! ```no_run
//! use inference::{EmbeddingEngine, ModelConfig, EmbeddingModel};
//!
//! #[tokio::main]
//! async fn main() -> Result<(), Box<dyn std::error::Error>> {
//!     // Create engine with default model (MiniLM)
//!     let engine = EmbeddingEngine::new(ModelConfig::default()).await?;
//!
//!     // Embed a query
//!     let query_embedding = engine.embed_query("What is machine learning?").await?;
//!     println!("Query embedding: {} dimensions", query_embedding.len());
//!
//!     // Embed documents
//!     let docs = vec![
//!         "Machine learning is a type of artificial intelligence.".to_string(),
//!         "Deep learning uses neural networks with many layers.".to_string(),
//!     ];
//!     let doc_embeddings = engine.embed_documents(&docs).await?;
//!     println!("Generated {} document embeddings", doc_embeddings.len());
//!
//!     Ok(())
//! }
//! ```
//!
//! ## Model Selection
//!
//! Choose the right model for your use case:
//!
//! | Model | Speed | Quality | Use Case |
//! |-------|-------|---------|----------|
//! | MiniLM | ⚡⚡⚡ | ⭐⭐ | High-throughput, real-time |
//! | BGE-small | ⚡⚡ | ⭐⭐⭐ | Balanced performance |
//! | E5-small | ⚡⚡ | ⭐⭐⭐ | Best quality for retrieval |
//!
//! ## GPU Acceleration
//!
//! Enable GPU acceleration by building with the appropriate feature:
//!
//! ```toml
//! # For NVIDIA GPUs
//! inference = { path = "crates/inference", features = ["cuda"] }
//!
//! # For Apple Silicon
//! inference = { path = "crates/inference", features = ["metal"] }
//! ```
//!
//! ## Architecture
//!
//! ```text
//! ┌─────────────────────────────────────────────────────────────┐
//! │                    EmbeddingEngine                          │
//! │  ┌─────────────┐  ┌──────────────┐  ┌──────────────────┐   │
//! │  │ ModelConfig │  │ BatchProcessor│  │   BertModel      │   │
//! │  │ - model     │  │ - tokenizer  │  │ (Candle)         │   │
//! │  │ - device    │  │ - batching   │  │ - forward()      │   │
//! │  │ - batch_sz  │  │ - prefixes   │  │ - mean_pool()    │   │
//! │  └─────────────┘  └──────────────┘  └──────────────────┘   │
//! └─────────────────────────────────────────────────────────────┘
//!//!//!              ┌───────────────────────────────┐
//!              │      Vec<f32> Embeddings      │
//!              │   (normalized, 384 dims)      │
//!              └───────────────────────────────┘
//! ```

pub mod batch;
pub mod engine;
pub mod error;
pub mod models;

// Re-exports for convenience
pub use engine::{EmbeddingEngine, EmbeddingEngineBuilder};
pub use error::{InferenceError, Result};
pub use models::{EmbeddingModel, ModelConfig};

/// Prelude module for convenient imports.
pub mod prelude {
    pub use crate::engine::{EmbeddingEngine, EmbeddingEngineBuilder};
    pub use crate::error::{InferenceError, Result};
    pub use crate::models::{EmbeddingModel, ModelConfig};
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_model_defaults() {
        let config = ModelConfig::default();
        assert_eq!(config.model, EmbeddingModel::MiniLM);
        assert_eq!(config.max_batch_size, 32);
        assert!(!config.use_gpu);
    }

    #[test]
    fn test_model_dimensions() {
        assert_eq!(EmbeddingModel::MiniLM.dimension(), 384);
        assert_eq!(EmbeddingModel::BgeSmall.dimension(), 384);
        assert_eq!(EmbeddingModel::E5Small.dimension(), 384);
    }
}