1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
//! # Dakera Inference Engine
//!
//! Embedded inference engine for generating vector embeddings locally without
//! external API calls. This crate provides:
//!
//! - **Local Embedding Generation**: Generate embeddings using state-of-the-art
//! transformer models running locally on CPU or GPU.
//! - **Multiple Model Support**: Choose from MiniLM (fast), BGE (balanced), or E5 (quality).
//! - **Batch Processing**: Efficient batch processing with automatic batching and parallelization.
//! - **Zero External Dependencies**: No OpenAI, Cohere, or other API keys required.
//!
//! ## Quick Start
//!
//! ```no_run
//! use inference::{EmbeddingEngine, ModelConfig, EmbeddingModel};
//!
//! #[tokio::main]
//! async fn main() -> Result<(), Box<dyn std::error::Error>> {
//! // Create engine with default model (MiniLM)
//! let engine = EmbeddingEngine::new(ModelConfig::default()).await?;
//!
//! // Embed a query
//! let query_embedding = engine.embed_query("What is machine learning?").await?;
//! println!("Query embedding: {} dimensions", query_embedding.len());
//!
//! // Embed documents
//! let docs = vec![
//! "Machine learning is a type of artificial intelligence.".to_string(),
//! "Deep learning uses neural networks with many layers.".to_string(),
//! ];
//! let doc_embeddings = engine.embed_documents(&docs).await?;
//! println!("Generated {} document embeddings", doc_embeddings.len());
//!
//! Ok(())
//! }
//! ```
//!
//! ## Model Selection
//!
//! Choose the right model for your use case:
//!
//! | Model | Speed | Quality | Use Case |
//! |-------|-------|---------|----------|
//! | MiniLM | ⚡⚡⚡ | ⭐⭐ | High-throughput, real-time |
//! | BGE-small | ⚡⚡ | ⭐⭐⭐ | Balanced performance |
//! | E5-small | ⚡⚡ | ⭐⭐⭐ | Best quality for retrieval |
//!
//! ## GPU Acceleration
//!
//! Enable GPU acceleration by building with the appropriate feature:
//!
//! ```toml
//! # For NVIDIA GPUs
//! inference = { path = "crates/inference", features = ["cuda"] }
//!
//! # For Apple Silicon
//! inference = { path = "crates/inference", features = ["metal"] }
//! ```
//!
//! ## Architecture
//!
//! ```text
//! ┌─────────────────────────────────────────────────────────────┐
//! │ EmbeddingEngine │
//! │ ┌─────────────┐ ┌──────────────┐ ┌──────────────────┐ │
//! │ │ ModelConfig │ │ BatchProcessor│ │ ort::Session │ │
//! │ │ - model │ │ - tokenizer │ │ (ONNX Runtime) │ │
//! │ │ - threads │ │ - batching │ │ - BERT INT8 │ │
//! │ │ - batch_sz │ │ - prefixes │ │ - mean_pool() │ │
//! │ └─────────────┘ └──────────────┘ └──────────────────┘ │
//! └─────────────────────────────────────────────────────────────┘
//! │
//! ▼
//! ┌───────────────────────────────┐
//! │ Vec<f32> Embeddings │
//! │ (normalized, 384 dims) │
//! └───────────────────────────────┘
//! ```
// Re-exports for convenience
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
/// Prelude module for convenient imports.