pub struct OllamaClient { /* private fields */ }Implementations§
Source§impl OllamaClient
impl OllamaClient
Sourcepub fn model_name(&self) -> &str
pub fn model_name(&self) -> &str
v0.7.0 (issue #1244) — accessor for the resolved model name.
Returns the model identifier the client was constructed with
(e.g. gemma3:4b on Ollama, grok-4.3 on xAI, claude-opus-4.7
on Anthropic). Substrate sites that bind LLM provenance into
signed audit events (e.g. the atomisation_complete
curator_model payload field) read this verbatim — never a
hardcoded string — so the signed event reflects the model that
actually ran on a given deployment, not a v0.6.x-era default.
Sourcepub fn new(model: &str) -> Result<Self>
pub fn new(model: &str) -> Result<Self>
Creates a new OllamaClient with the default Ollama URL (http://localhost:11434).
Checks that Ollama is reachable before returning.
Sourcepub fn from_env() -> Result<Option<Self>>
pub fn from_env() -> Result<Option<Self>>
#1066 — Construct from environment variables. Returns Ok(Some(client))
when the env declares an LLM backend; Ok(None) when no backend is
configured (keyword-only deployments); Err on misconfiguration
(e.g. backend declared but required key missing).
Reads:
AI_MEMORY_LLM_BACKEND—ollama(default) |openai-compatible| one of the per-vendor aliases (xai,openai,anthropic,gemini,deepseek,kimi,qwen,mistral,groq,together,cerebras,openrouter,fireworks,lmstudio).AI_MEMORY_LLM_BASE_URL— overrides the default per-alias URL.AI_MEMORY_LLM_API_KEY— Bearer auth secret for the OpenAI-compatible path. Per-alias fallback env vars are also consulted (XAI_API_KEY,OPENAI_API_KEY,ANTHROPIC_API_KEY,GEMINI_API_KEY,DEEPSEEK_API_KEY,MOONSHOT_API_KEY,DASHSCOPE_API_KEY, etc.).AI_MEMORY_LLM_MODEL— model name (grok-4,gpt-5,claude-opus-4.7,gemini-2.0-flash,deepseek-chat, etc.).- Legacy
OLLAMA_BASE_URLis still honored when backend isollama(or unset).
§Errors
AI_MEMORY_LLM_BACKENDis set to an unknown alias.- Backend is OpenAI-compatible (or an alias) but no API key is
resolvable from
AI_MEMORY_LLM_API_KEYor any per-alias fallback env var. - Backend is the generic
openai-compatibleandAI_MEMORY_LLM_BASE_URLis unset. - The HTTP client itself fails to build.
Sourcepub fn build_for_init(
legacy_url: &str,
legacy_model: &str,
) -> Result<Option<Self>>
pub fn build_for_init( legacy_url: &str, legacy_model: &str, ) -> Result<Option<Self>>
#1143 — Sync env-aware client construction with a tier-default
legacy fallback. Centralises the pattern that #1142 ported into
src/mcp/mod.rs so every synchronous LLM-init site (CLI
atomise, CLI curator, MCP stdio LLM init, embed-client
fallback selection) routes through one place. The daemon’s
async path (daemon_runtime::build_llm_client) wraps the same
resolution order in tokio::task::spawn_blocking; behavioural
parity with that wrapper is pinned by tests below.
Resolution order:
AI_MEMORY_LLM_BACKENDset + non-empty →from_env().- Else →
new_with_url(legacy_url, legacy_model)so a v0.6.x operator who never set the env vars keeps the historical tier-default Ollama path.
Returns Ok(None) from the env-aware arm only when the env var
chain resolves to a no-op (currently impossible for any
recognised backend alias; defensively threaded so future “alias
disabled” branches don’t break callers).
§Errors
Mirrors Self::from_env when the env arm is taken, and
Self::new_with_url when the legacy arm is taken.
Sourcepub fn build_from_resolved(resolved: &ResolvedLlm) -> Result<Option<Self>>
pub fn build_from_resolved(resolved: &ResolvedLlm) -> Result<Option<Self>>
v0.7.x (#1146) — Construct an OllamaClient from a fully-resolved
LLM configuration produced by crate::config::AppConfig::resolve_llm.
This is the enterprise-class single-entry-point that replaces
every call to Self::build_for_init /
Self::new_with_url / Self::from_env /
Self::new_openai_compatible in the surface plumbing.
The resolver has already done all precedence + provenance work
(CLI flag > env > [llm] config section > legacy fields >
compiled default) and produced a [ResolvedLlm] carrying the
authoritative (backend, model, base_url, api_key) quad. This
constructor just maps it onto the appropriate wire-shape
client.
Returns Ok(None) when the resolved api_key_source is
KeySource::Error(_) and the backend is non-Ollama (so we
can’t even attempt to construct an OpenAI-compatible client).
The error surfaces through the ai-memory doctor LLM
reachability probe rather than panicking at construct time.
§Errors
Returns an error if the HTTP client itself fails to build, or
if the Ollama-backend reachability check fails the same way
Self::new_with_url already fails.
Sourcepub async fn build_from_resolved_async(
resolved: &ResolvedLlm,
) -> Result<Option<Self>>
pub async fn build_from_resolved_async( resolved: &ResolvedLlm, ) -> Result<Option<Self>>
FX-D1 (v0.7.0, 2026-05-27) — async sibling of
Self::build_from_resolved. Surgical fix for the
daemon_runtime::build_llm_client callsite that hit the
FX-C1 block_on_local current-thread panic: the daemon
wrapped this sync constructor in tokio::task::spawn_blocking,
and the blocking pool thread inherited the outer (current-
thread, in #[tokio::test]) runtime handle, which drove
block_on_local into its panic arm.
Callers already on a tokio runtime — the daemon’s
build_llm_client, mcp/mod.rs::run_mcp_server once it
migrates, and CLI atomise/curator builders — should call this
directly to bypass the sync→async bridge entirely. The Ollama
arm now goes through Self::new_with_url_async (no
block_on_local); the non-Ollama arm uses
Self::new_openai_compatible which is already pure-sync
(no I/O — just a reqwest::Client::builder).
§Errors
Same conditions as Self::build_from_resolved: Ollama
reachability failure, missing API key for a non-Ollama
backend, or HTTP client build failure.
Sourcepub fn is_ollama_native(&self) -> bool
pub fn is_ollama_native(&self) -> bool
#1143 — Wire-shape introspection for embed-client fallback.
Embed endpoints differ from chat endpoints across vendors: only
Ollama (and a couple of OpenAI-compatible vendors) expose a
usable embedding wire-shape, and the substrate’s local embedder
integration only speaks the Ollama /api/embed shape. Callers
that consider re-using the LLM client for embeddings use this
to bail out when the client is an OpenAI-compatible vendor.
Sourcepub fn new_openai_compatible(
base_url: &str,
model: &str,
api_key: &str,
) -> Result<Self>
pub fn new_openai_compatible( base_url: &str, model: &str, api_key: &str, ) -> Result<Self>
#1066 — Construct an OpenAI-compatible client for any vendor whose
/v1/chat/completions endpoint follows the OpenAI spec (xAI Grok,
OpenAI, Anthropic via OpenAI shim, Google Gemini, DeepSeek, Kimi,
Qwen, Mistral, Groq, Together, Cerebras, OpenRouter, Fireworks,
LMStudio, vLLM, llama.cpp server, …).
§Errors
Returns an error if the HTTP client fails to build.
Sourcepub fn with_embed_dimensions(self, dims: Option<u32>) -> Self
pub fn with_embed_dimensions(self, dims: Option<u32>) -> Self
#1598 (fleet follow-up) — builder for the requested embedding
output dimensionality (see the embed_dimensions field doc).
None clears the request (model-native dim).
Sourcepub fn new_with_url(base_url: &str, model: &str) -> Result<Self>
pub fn new_with_url(base_url: &str, model: &str) -> Result<Self>
Creates a new OllamaClient with a custom base URL.
Checks that Ollama is reachable before returning.
v0.7.0 F6: the underlying reqwest client now carries an explicit
connect_timeout so a dead endpoint fails in [CONNECT_TIMEOUT]
instead of hanging on the kernel SYN retry budget. The per-request
timeout is preserved at [GENERATE_TIMEOUT].
PERF-9 (v0.7.0 FX-C1, 2026-05-26). Sync wrapper around
Self::new_with_url_async via the block_on_local helper.
Callers already on a tokio runtime should prefer the async
constructor directly.
PERF-12 (FX-C4-batch2, 2026-05-26). This constructor still
performs the /api/tags Ollama health probe at construction
time, preserving the v0.6.x fail-fast posture for callers that
depend on construction-time validation (e.g. CLI commands).
Boot-fast daemon paths that want to defer reachability
verification to first-use should use
Self::new_with_url_no_health_check instead.
Sourcepub async fn new_with_url_async(base_url: &str, model: &str) -> Result<Self>
pub async fn new_with_url_async(base_url: &str, model: &str) -> Result<Self>
PERF-9 (v0.7.0 FX-C1) — async constructor variant. Builds the
async reqwest::Client and probes /api/tags (Ollama health)
without blocking the calling thread. Callers inside a tokio
runtime (HTTP handler, daemon path, MCP stdio loop once it
adopts a tokio bridge) should call this directly.
Sourcepub fn new_with_url_no_health_check(base_url: &str, model: &str) -> Result<Self>
pub fn new_with_url_no_health_check(base_url: &str, model: &str) -> Result<Self>
PERF-12 (FX-C4-batch2, 2026-05-26) — construct an
OllamaClient WITHOUT the synchronous /api/tags health
check.
Boot-fast variant for daemon paths that want to defer
reachability verification to first-use (or to the
ai-memory doctor reachability sweep). Saves the 50-200 ms
round-trip to a remote LLM endpoint on every serve boot
and on every ai-memory mcp dispatch. The circuit-breaker
at Self::generate still handles transient failures the
usual way, so a degraded LLM endpoint is contained at first
use rather than at construction.
Use Self::new_with_url when caller-side construction-
time validation is required (e.g. CLI commands that fail
fast on bring-up).
Sourcepub fn is_available(&self) -> bool
pub fn is_available(&self) -> bool
Quick health check — returns true if the backend responds 2xx.
- Ollama:
GET /api/tags(lists pulled models) - OpenAI-compatible:
GET /v1/modelswith Bearer auth (most vendors support this endpoint)
Strict semantics: 4xx and 5xx return false. A vendor that
returns 401 on bad auth is treated as “not available” because
we cannot use it. The circuit-breaker in Self::generate
handles transient 5xx burst behavior separately. Matches the
pre-#1067 contract pinned by
wiremock_tests::test_is_available_returns_false_on_500_response.
PERF-9 (v0.7.0 FX-C1) — sync wrapper around
Self::is_available_async. The async variant should be
preferred by every callsite already on a tokio runtime.
Sourcepub async fn is_available_async(&self) -> bool
pub async fn is_available_async(&self) -> bool
PERF-9 (v0.7.0 FX-C1) — async variant of Self::is_available.
Same semantics; no thread blocked.
Sourcepub fn ensure_model(&self) -> Result<()>
pub fn ensure_model(&self) -> Result<()>
Ensure the configured model is available.
- Ollama: lists
/api/tags, pulls via/api/pullif missing. - OpenAI-compatible: no-op — model availability is the vendor’s concern (operator is responsible for confirming the model exists on the chosen vendor’s plan).
PERF-9 (v0.7.0 FX-C1) — sync wrapper around
Self::ensure_model_async.
Sourcepub async fn ensure_model_async(&self) -> Result<()>
pub async fn ensure_model_async(&self) -> Result<()>
PERF-9 (v0.7.0 FX-C1) — async variant of Self::ensure_model.
§Errors
Returns an error if the /api/tags listing fails, the response
JSON cannot be parsed, the pull-client cannot be built, or the
pull request fails.
Sourcepub fn generate(&self, prompt: &str, system: Option<&str>) -> Result<String>
pub fn generate(&self, prompt: &str, system: Option<&str>) -> Result<String>
Generates a completion using the /api/chat endpoint (Ollama chat format). This is compatible with both Ollama and vMLX/OpenAI-compatible servers. Returns the response text.
v0.7.0 F6 — the call is guarded by a circuit breaker. After
[CIRCUIT_BREAKER_THRESHOLD] consecutive failures the call
fast-fails for [CIRCUIT_BREAKER_COOLDOWN] instead of waiting
the full HTTP timeout each time. This is the key defence
against the Round-2 F6 deadlock where a dead ollama caused
every chat-backed MCP tool to hang the daemon for 30s+.
PERF-9 (v0.7.0 FX-C1, 2026-05-26) — sync wrapper around
Self::generate_async. Callers already inside a tokio
runtime (HTTP handlers, the daemon path) should prefer the
async variant directly to skip the bridge overhead.
Sourcepub async fn generate_async(
&self,
prompt: &str,
system: Option<&str>,
) -> Result<String>
pub async fn generate_async( &self, prompt: &str, system: Option<&str>, ) -> Result<String>
PERF-9 (v0.7.0 FX-C1) — async variant of Self::generate.
Same circuit-breaker semantics; same wire shape; same error
branches. Use this from any caller already inside a tokio
runtime to avoid the block_on_local bridge.
§Errors
Returns an error when the circuit breaker is open, the
governance NetworkRequest gate refuses the outbound, the HTTP
send fails, the response is non-2xx, the response body is not
valid JSON, or the JSON is missing the expected
message.content (Ollama) / choices[0].message.content
(OpenAI-compatible) field.
Sourcepub fn expand_query(&self, query: &str) -> Result<Vec<String>>
pub fn expand_query(&self, query: &str) -> Result<Vec<String>>
Uses the LLM to expand a search query into additional search terms.
Sourcepub async fn expand_query_async(&self, query: &str) -> Result<Vec<String>>
pub async fn expand_query_async(&self, query: &str) -> Result<Vec<String>>
PERF-9 (v0.7.0 FX-C1) — async variant of Self::expand_query.
§Errors
Propagates any error from the underlying Self::generate_async
call (circuit-breaker open, governance refusal, HTTP failure,
malformed response, etc.).
Sourcepub fn summarize_memories(
&self,
memories: &[(String, String)],
) -> Result<String>
pub fn summarize_memories( &self, memories: &[(String, String)], ) -> Result<String>
Takes (title, content) pairs and returns a consolidated summary.
Sourcepub async fn summarize_memories_async(
&self,
memories: &[(String, String)],
) -> Result<String>
pub async fn summarize_memories_async( &self, memories: &[(String, String)], ) -> Result<String>
PERF-9 (v0.7.0 FX-C1) — async variant of Self::summarize_memories.
§Errors
Propagates any error from the underlying Self::generate_async
call.
Sourcepub fn auto_tag(
&self,
title: &str,
content: &str,
model_override: Option<&str>,
) -> Result<Vec<String>>
pub fn auto_tag( &self, title: &str, content: &str, model_override: Option<&str>, ) -> Result<Vec<String>>
Generate up to 8 lowercase semantic tags for a memory.
model_override (L15): when Some, uses that model instead of self.model.
Auto_tag is a short structured-output task; using gemma3:4b (12 tokens
avg) is dramatically faster than Gemma 4 with its 400+ token thinking
output. See bench data in docs/plan-c-cert.md.
num_predict is hard-capped at 64 tokens regardless of model — defense
in depth against unbounded chain-of-thought emissions on any model.
Sourcepub async fn auto_tag_async(
&self,
title: &str,
content: &str,
model_override: Option<&str>,
) -> Result<Vec<String>>
pub async fn auto_tag_async( &self, title: &str, content: &str, model_override: Option<&str>, ) -> Result<Vec<String>>
PERF-9 (v0.7.0 FX-C1) — async variant of Self::auto_tag.
§Errors
Propagates any error from the underlying
Self::generate_with_model_override_async call.
Sourcepub async fn generate_with_model_override_async(
&self,
prompt: &str,
system: Option<&str>,
model_override: Option<&str>,
) -> Result<String>
pub async fn generate_with_model_override_async( &self, prompt: &str, system: Option<&str>, model_override: Option<&str>, ) -> Result<String>
PERF-9 (v0.7.0 FX-C1) — async variant of
Self::generate_with_model_override. Same wire shape, same
breaker semantics; no thread blocked.
§Errors
Same as Self::generate_async.
Sourcepub fn embed_text(&self, text: &str, embed_model: &str) -> Result<Vec<f32>>
pub fn embed_text(&self, text: &str, embed_model: &str) -> Result<Vec<f32>>
Generate an embedding vector via Ollama’s /api/embed endpoint.
Used for nomic-embed-text-v1.5 on smart/autonomous tiers.
v0.7.0 F6 — like OllamaClient::generate, this call is guarded
by the same circuit breaker so a dead ollama endpoint doesn’t
block every store/recall path on a per-call timeout.
Sourcepub async fn embed_text_async(
&self,
text: &str,
embed_model: &str,
) -> Result<Vec<f32>>
pub async fn embed_text_async( &self, text: &str, embed_model: &str, ) -> Result<Vec<f32>>
PERF-9 (v0.7.0 FX-C1) — async variant of Self::embed_text.
Production callers (HTTP handlers, daemon) should prefer this
over the sync wrapper.
§Errors
Returns an error when the circuit breaker is open, the
governance gate refuses the outbound, the HTTP send fails, the
response is non-2xx, the body is not valid JSON, the
expected embeddings[0] (Ollama) /
data[0].embedding (OpenAI-compatible) field is missing, or
the parsed embedding vector is empty.
Sourcepub fn embed_texts(
&self,
texts: &[&str],
embed_model: &str,
) -> Result<Vec<Vec<f32>>>
pub fn embed_texts( &self, texts: &[&str], embed_model: &str, ) -> Result<Vec<Vec<f32>>>
#1603 — generate embeddings for MANY texts, batching the wire
where the provider supports it. Sync wrapper over
Self::embed_texts_async (same block_on_local discipline as
Self::embed_text).
§Errors
Propagates the first per-request error (see
Self::embed_texts_async).
Sourcepub async fn embed_texts_async(
&self,
texts: &[&str],
embed_model: &str,
) -> Result<Vec<Vec<f32>>>
pub async fn embed_texts_async( &self, texts: &[&str], embed_model: &str, ) -> Result<Vec<Vec<f32>>>
#1603 — async batched embed. Provider behaviour:
- OpenAI-compatible — the
/embeddingswire shape natively accepts"input": [array of strings], so the inputs are sent in sub-batches of at most [EMBED_BATCH_MAX_INPUTS] texts / [EMBED_BATCH_MAX_BYTES] total bytes per request (one POST per sub-batch instead of one POST per text — the pre-#1603 per-row loop drained an API-backed backfill at ~20 rows/min). On a batch-level error the sub-batch falls back to per-text requests so one rejected input (e.g. an over-context row the vendor 4xxes) cannot poison its whole sub-batch — the same isolation posture as the #1595 backfill fallback. - Ollama (native) — per-text loop preserved verbatim: the
batched
/api/embedwire shape differs across the pinned Ollama versions (the PERF-5 deferral), so batching is staged behind the OpenAI-compatible arm only.
Output order matches input order. The OpenAI-compatible parse
honours the response data[*].index field when present
(providers may reorder) and falls back to positional order.
§Errors
Returns an error when the circuit breaker is open, the
governance gate refuses the outbound, a request fails after the
per-text fallback, the response shape is missing
data[*].embedding, or the vector count does not match the
input count.
Sourcepub fn ensure_embed_model(&self, model: &str) -> Result<()>
pub fn ensure_embed_model(&self, model: &str) -> Result<()>
Ensure an embedding model is available.
- Ollama: lists
/api/tags, pulls via/api/pullif missing. - OpenAI-compatible: no-op — vendor-side concern (operator confirms model availability on their plan).
Sourcepub async fn ensure_embed_model_async(&self, model: &str) -> Result<()>
pub async fn ensure_embed_model_async(&self, model: &str) -> Result<()>
PERF-9 (v0.7.0 FX-C1) — async variant of Self::ensure_embed_model.
§Errors
Returns an error if the /api/tags listing fails, the JSON
parse fails, the pull client cannot be built, or the
/api/pull request fails (network or non-2xx response).
Sourcepub fn detect_contradiction(&self, mem_a: &str, mem_b: &str) -> Result<bool>
pub fn detect_contradiction(&self, mem_a: &str, mem_b: &str) -> Result<bool>
Returns true if two memory contents contradict each other.
Sourcepub async fn detect_contradiction_async(
&self,
mem_a: &str,
mem_b: &str,
) -> Result<bool>
pub async fn detect_contradiction_async( &self, mem_a: &str, mem_b: &str, ) -> Result<bool>
PERF-9 (v0.7.0 FX-C1) — async variant of
Self::detect_contradiction.
§Errors
Propagates any error from the underlying Self::generate_async
call.
Trait Implementations§
Source§impl AutonomyLlm for OllamaClient
impl AutonomyLlm for OllamaClient
Source§impl LlmGenerate for OllamaClient
impl LlmGenerate for OllamaClient
Auto Trait Implementations§
impl !Freeze for OllamaClient
impl !RefUnwindSafe for OllamaClient
impl !UnwindSafe for OllamaClient
impl Send for OllamaClient
impl Sync for OllamaClient
impl Unpin for OllamaClient
impl UnsafeUnpin for OllamaClient
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<T> ErasedDestructor for Twhere
T: 'static,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more