pub enum Embedder {
Local {
model: Arc<BertModel>,
tokenizer: Arc<Tokenizer>,
device: Device,
},
Ollama {
client: Arc<OllamaClient>,
model_name: String,
dim: usize,
degraded: Arc<AtomicBool>,
},
}Expand description
Semantic embedding engine supporting multiple backends.
- Local (candle): all-MiniLM-L6-v2, 384-dim. Used at the semantic tier.
- Ollama: nomic-embed-text-v1.5, 768-dim. Used at smart/autonomous tiers.
Variants§
Local
Candle-based local embedding (MiniLM-L6-v2, 384-dim).
v0.7.0 #1084 — model is Arc<BertModel> (no mutex). The
pre-#1084 design held an Arc<Mutex<BertModel>> and locked
the model across the full forward pass; on a multi-tenant
HTTP daemon that serialised every embed call on a single
global mutex. Candle’s BertModel::forward(&self, ...) is
inference-only (weights are read-only mmap’d safetensors)
so the mutex was unnecessary; parallel embed calls now run
concurrently against the same weights.
Ollama
Remote embed client — Ollama-native OR OpenAI-compatible wire
shape (#1598). The historical variant name is preserved to
avoid call-site churn; the carried crate::llm::OllamaClient
routes /api/embed (Ollama) or /embeddings + Bearer
(OpenAI-compatible) per its provider. dim is the model’s
vector dimensionality (768 for the historical nomic default);
degraded latches the outcome of the most recent embed call so
the capabilities surface can report a dead remote endpoint
truthfully (#1594).
Implementations§
Source§impl Embedder
impl Embedder
Sourcepub fn new() -> Result<Self>
pub fn new() -> Result<Self>
Create a new local (candle) embedder for MiniLM-L6-v2. Downloads the model if it is not already cached.
Sourcepub fn new_ollama(client: Arc<OllamaClient>) -> Self
pub fn new_ollama(client: Arc<OllamaClient>) -> Self
Create an Ollama-based embedder for nomic-embed-text-v1.5 (768-dim).
Requires the Ollama client to already be connected and the model pulled.
Sourcepub fn new_remote(
client: Arc<OllamaClient>,
model_name: String,
dim: usize,
) -> Self
pub fn new_remote( client: Arc<OllamaClient>, model_name: String, dim: usize, ) -> Self
#1598 — create a remote embedder for an arbitrary model + dim.
client may speak either wire shape: Ollama-native
(OllamaClient::new_with_url) or OpenAI-compatible
(OllamaClient::new_openai_compatible — OpenRouter, HF TEI,
vLLM, …). The degraded flag starts false and tracks the
most recent embed outcome.
Sourcepub fn from_resolved(
resolved: &ResolvedEmbeddings,
tier_model: Option<EmbeddingModel>,
) -> Result<Option<Self>>
pub fn from_resolved( resolved: &ResolvedEmbeddings, tier_model: Option<EmbeddingModel>, ) -> Result<Option<Self>>
#1598 — single shared boot entry for both wiring sites (MCP
stdio init + daemon_runtime::build_embedder). Consumes the
canonical crate::config::AppConfig::resolve_embeddings
output and the tier’s embedding-model gate:
tier_model = None(keyword tier) →Ok(None).- API backend (
crate::config::is_api_embed_backend) → OpenAI-compatible remote client againstresolved.urlwith the resolved Bearer key. Keyless self-hosted endpoints (HF TEI / vLLM) are legitimate: a missing key sends an empty Bearer value, which such servers ignore. Requires a known dim ([embeddings].dimoverride or the known-dims table) — bails otherwise so mismatched vectors never land silently. - Ollama backend → the historical
Self::for_modelpath (MiniLM = local candle regardless; nomic = Ollama client atresolved.url). Client construction failure returnsErr— callers fail closed to keyword recall (#1593), NEVER to the chat LLM client.
§Errors
Remote-client construction failure, an unknown vector dim for an API-backend model, or local model-load failure.
Sourcepub fn for_model(
model: EmbeddingModel,
ollama_client: Option<Arc<OllamaClient>>,
) -> Result<Self>
pub fn for_model( model: EmbeddingModel, ollama_client: Option<Arc<OllamaClient>>, ) -> Result<Self>
Create an embedder for the specified model.
MiniLmL6V2→ local candle embedderNomicEmbedV15→ Ollama-based (requiresollama_client)
Sourcepub fn model_description(&self) -> String
pub fn model_description(&self) -> String
Human-readable description of the active embedding model.
#1598 — returns String (the remote variant reports its live
model + dim, which may be any operator-picked API model id,
not just the historical nomic default).
Sourcepub fn is_degraded(&self) -> bool
pub fn is_degraded(&self) -> bool
#1598 / #1594 — true when the most recent remote embed call
failed (dead endpoint, auth rejection, …). The local candle
embedder never degrades at runtime (weights are mmap’d at
construction). Consumed by the capabilities surface so
features.embedder_loaded / recall_mode_active report the
LIVE posture rather than the boot-time one.
Sourcepub fn embed(&self, text: &str) -> Result<Vec<f32>>
pub fn embed(&self, text: &str) -> Result<Vec<f32>>
Generate an embedding for a single text input indexed as a
corpus document. Thin alias for Embedder::embed_with_role
with EmbedRole::Document — the safe default for every
write/index path and for symmetric comparisons.
Sourcepub fn embed_query(&self, text: &str) -> Result<Vec<f32>>
pub fn embed_query(&self, text: &str) -> Result<Vec<f32>>
Generate an embedding for a text used as a search query. Thin
alias for Embedder::embed_with_role with EmbedRole::Query.
For the asymmetric Ollama nomic backend this applies the
search_query: task prefix so query↔document cosine is
meaningful (#1520); the symmetric local MiniLM backend ignores
the role.
Sourcepub fn embed_with_role(&self, text: &str, role: EmbedRole) -> Result<Vec<f32>>
pub fn embed_with_role(&self, text: &str, role: EmbedRole) -> Result<Vec<f32>>
Generate an embedding for text under an explicit retrieval
EmbedRole. The local candle MiniLM backend is symmetric and
ignores the role; the Ollama nomic backend prepends the
role-specific task-instruction prefix required by
nomic-embed-text-v1.5 (#1520).
Sourcepub fn embed_with_status(&self, text: &str) -> (Option<Vec<f32>>, EmbedStatus)
pub fn embed_with_status(&self, text: &str) -> (Option<Vec<f32>>, EmbedStatus)
v0.7.0 F6 — generate an embedding and report the outcome.
Combines the existing Embedder::embed call with an
EmbedStatus tag so the caller (HTTP store path, MCP store
path, sync ingestion, …) can surface a structured signal on the
response when the embedder skipped or errored. Behaviour:
- Empty input →
(None, Skipped("empty content")) - Input larger than
EMBED_MAX_BYTES→(None, Skipped(reason)) - Embedder errors →
(None, Failed(reason)) - Otherwise →
(Some(vec), Indexed)
Callers that don’t care about the status keep using
Embedder::embed; this is the new opt-in API.
Sourcepub fn embed_batch(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>>
pub fn embed_batch(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>>
Generate embeddings for multiple texts in one call.
PERF-5 (FX-C4-batch2, 2026-05-26): true batched forward
instead of the prior texts.iter().map(|t| self.embed(t))
fan-out. The Local arm tokenises every input, pads to the
batch’s max sequence length, stacks to a (B, L) tensor, and
runs BertModel::forward ONCE per batch — Candle’s
per-call overhead dominates B=1 calls, so a true batch of 32
inputs is ~10-20× faster than 32 sequential calls. The
Ollama arm continues to dispatch one POST per text (the
vendor wire shape for batched /api/embed differs across
Ollama versions and a wire-version probe would add the same
per-call latency we are saving; keep the per-text loop here
while a LlmClient-side batched-embed API is staged).
Callers: multistep_ingest, atomisation, the periodic
embedding-backfill sweep (AI_MEMORY_EMBED_BACKFILL_BATCH).
Sourcepub fn cosine_similarity(a: &[f32], b: &[f32]) -> f32
pub fn cosine_similarity(a: &[f32], b: &[f32]) -> f32
Compute cosine similarity between two embedding vectors.
Sourcepub fn cosine_similarity_checked(
query: &[f32],
stored: &[f32],
) -> CosineComparison
pub fn cosine_similarity_checked( query: &[f32], stored: &[f32], ) -> CosineComparison
v0.7.0 H7 — dimension-aware companion to Embedder::cosine_similarity.
Returns CosineComparison::DimensionMismatch instead of silently
yielding 0.0 when the two vectors have different lengths, so the
recall pipeline can report cross-model (embedder-switch) embeddings
rather than dropping their semantic signal unseen. When the
dimensions agree the result wraps the same value
Embedder::cosine_similarity would return.
Sourcepub fn fuse(primary: &[f32], secondary: &[f32], primary_weight: f32) -> Vec<f32>
pub fn fuse(primary: &[f32], secondary: &[f32], primary_weight: f32) -> Vec<f32>
Fuse a primary query embedding with a secondary context embedding via weighted linear combination (v0.6.0.0 contextual recall).
primary_weight clamped to [0.0, 1.0]. The result is returned
un-normalized — cosine_similarity divides out magnitudes, so the
downstream signal is direction-only. Returns primary.to_vec() when
dimensions differ (graceful fallback, same policy as
cosine_similarity).
Trait Implementations§
Source§impl Embed for Embedder
v0.7.0 L0.7 — Embed trait impl that delegates to the inherent
Embedder::embed / Embedder::embed_batch methods. The
inherent methods stay on Embedder verbatim so existing callers
that hold a concrete &Embedder keep their fast path; the trait
impl is purely additive and enables dyn Embed substitution for
handler signatures (see Embed docs).
impl Embed for Embedder
v0.7.0 L0.7 — Embed trait impl that delegates to the inherent
Embedder::embed / Embedder::embed_batch methods. The
inherent methods stay on Embedder verbatim so existing callers
that hold a concrete &Embedder keep their fast path; the trait
impl is purely additive and enables dyn Embed substitution for
handler signatures (see Embed docs).
Source§fn embed(&self, text: &str) -> Result<Vec<f32>>
fn embed(&self, text: &str) -> Result<Vec<f32>>
text. Read moreSource§fn embed_query(&self, text: &str) -> Result<Vec<f32>>
fn embed_query(&self, text: &str) -> Result<Vec<f32>>
text used as a search
query. Default implementation delegates to Embed::embed,
which is correct for symmetric embedders (and the test
MockEmbedder); the production Embedder overrides it so the
asymmetric Ollama nomic backend applies the search_query: task
prefix (#1520). Read moreSource§fn embed_batch(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>>
fn embed_batch(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>>
Embed::embed in a loop; implementors
may override to do native batching. Read moreSource§fn is_degraded(&self) -> bool
fn is_degraded(&self) -> bool
false (correct
for local / mock embedders); the production Embedder
overrides it for the remote variant so the capabilities surface
reports a dead endpoint truthfully.Auto Trait Implementations§
impl !RefUnwindSafe for Embedder
impl !UnwindSafe for Embedder
impl Freeze for Embedder
impl Send for Embedder
impl Sync for Embedder
impl Unpin for Embedder
impl UnsafeUnpin for Embedder
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> ErasedDestructor for Twhere
T: 'static,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more