Skip to main content

OllamaClient

Struct OllamaClient 

Source
pub struct OllamaClient { /* private fields */ }

Implementations§

Source§

impl OllamaClient

Source

pub fn model_name(&self) -> &str

v0.7.0 (issue #1244) — accessor for the resolved model name.

Returns the model identifier the client was constructed with (e.g. gemma3:4b on Ollama, grok-4.3 on xAI, claude-opus-4.7 on Anthropic). Substrate sites that bind LLM provenance into signed audit events (e.g. the atomisation_complete curator_model payload field) read this verbatim — never a hardcoded string — so the signed event reflects the model that actually ran on a given deployment, not a v0.6.x-era default.

Source

pub fn new(model: &str) -> Result<Self>

Creates a new OllamaClient with the default Ollama URL (http://localhost:11434). Checks that Ollama is reachable before returning.

Source

pub fn from_env() -> Result<Option<Self>>

#1066 — Construct from environment variables. Returns Ok(Some(client)) when the env declares an LLM backend; Ok(None) when no backend is configured (keyword-only deployments); Err on misconfiguration (e.g. backend declared but required key missing).

Reads:

  • AI_MEMORY_LLM_BACKENDollama (default) | openai-compatible | one of the per-vendor aliases (xai, openai, anthropic, gemini, deepseek, kimi, qwen, mistral, groq, together, cerebras, openrouter, fireworks, lmstudio).
  • AI_MEMORY_LLM_BASE_URL — overrides the default per-alias URL.
  • AI_MEMORY_LLM_API_KEY — Bearer auth secret for the OpenAI-compatible path. Per-alias fallback env vars are also consulted (XAI_API_KEY, OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, DEEPSEEK_API_KEY, MOONSHOT_API_KEY, DASHSCOPE_API_KEY, etc.).
  • AI_MEMORY_LLM_MODEL — model name (grok-4, gpt-5, claude-opus-4.7, gemini-2.0-flash, deepseek-chat, etc.).
  • Legacy OLLAMA_BASE_URL is still honored when backend is ollama (or unset).
§Errors
  • AI_MEMORY_LLM_BACKEND is set to an unknown alias.
  • Backend is OpenAI-compatible (or an alias) but no API key is resolvable from AI_MEMORY_LLM_API_KEY or any per-alias fallback env var.
  • Backend is the generic openai-compatible and AI_MEMORY_LLM_BASE_URL is unset.
  • The HTTP client itself fails to build.
Source

pub fn build_for_init( legacy_url: &str, legacy_model: &str, ) -> Result<Option<Self>>

#1143 — Sync env-aware client construction with a tier-default legacy fallback. Centralises the pattern that #1142 ported into src/mcp/mod.rs so every synchronous LLM-init site (CLI atomise, CLI curator, MCP stdio LLM init, embed-client fallback selection) routes through one place. The daemon’s async path (daemon_runtime::build_llm_client) wraps the same resolution order in tokio::task::spawn_blocking; behavioural parity with that wrapper is pinned by tests below.

Resolution order:

  1. AI_MEMORY_LLM_BACKEND set + non-empty → from_env().
  2. Else → new_with_url(legacy_url, legacy_model) so a v0.6.x operator who never set the env vars keeps the historical tier-default Ollama path.

Returns Ok(None) from the env-aware arm only when the env var chain resolves to a no-op (currently impossible for any recognised backend alias; defensively threaded so future “alias disabled” branches don’t break callers).

§Errors

Mirrors Self::from_env when the env arm is taken, and Self::new_with_url when the legacy arm is taken.

Source

pub fn build_from_resolved(resolved: &ResolvedLlm) -> Result<Option<Self>>

v0.7.x (#1146) — Construct an OllamaClient from a fully-resolved LLM configuration produced by crate::config::AppConfig::resolve_llm. This is the enterprise-class single-entry-point that replaces every call to Self::build_for_init / Self::new_with_url / Self::from_env / Self::new_openai_compatible in the surface plumbing.

The resolver has already done all precedence + provenance work (CLI flag > env > [llm] config section > legacy fields > compiled default) and produced a [ResolvedLlm] carrying the authoritative (backend, model, base_url, api_key) quad. This constructor just maps it onto the appropriate wire-shape client.

Returns Ok(None) when the resolved api_key_source is KeySource::Error(_) and the backend is non-Ollama (so we can’t even attempt to construct an OpenAI-compatible client). The error surfaces through the ai-memory doctor LLM reachability probe rather than panicking at construct time.

§Errors

Returns an error if the HTTP client itself fails to build, or if the Ollama-backend reachability check fails the same way Self::new_with_url already fails.

Source

pub async fn build_from_resolved_async( resolved: &ResolvedLlm, ) -> Result<Option<Self>>

FX-D1 (v0.7.0, 2026-05-27) — async sibling of Self::build_from_resolved. Surgical fix for the daemon_runtime::build_llm_client callsite that hit the FX-C1 block_on_local current-thread panic: the daemon wrapped this sync constructor in tokio::task::spawn_blocking, and the blocking pool thread inherited the outer (current- thread, in #[tokio::test]) runtime handle, which drove block_on_local into its panic arm.

Callers already on a tokio runtime — the daemon’s build_llm_client, mcp/mod.rs::run_mcp_server once it migrates, and CLI atomise/curator builders — should call this directly to bypass the sync→async bridge entirely. The Ollama arm now goes through Self::new_with_url_async (no block_on_local); the non-Ollama arm uses Self::new_openai_compatible which is already pure-sync (no I/O — just a reqwest::Client::builder).

§Errors

Same conditions as Self::build_from_resolved: Ollama reachability failure, missing API key for a non-Ollama backend, or HTTP client build failure.

Source

pub fn is_ollama_native(&self) -> bool

#1143 — Wire-shape introspection for embed-client fallback. Embed endpoints differ from chat endpoints across vendors: only Ollama (and a couple of OpenAI-compatible vendors) expose a usable embedding wire-shape, and the substrate’s local embedder integration only speaks the Ollama /api/embed shape. Callers that consider re-using the LLM client for embeddings use this to bail out when the client is an OpenAI-compatible vendor.

Source

pub fn new_openai_compatible( base_url: &str, model: &str, api_key: &str, ) -> Result<Self>

#1066 — Construct an OpenAI-compatible client for any vendor whose /v1/chat/completions endpoint follows the OpenAI spec (xAI Grok, OpenAI, Anthropic via OpenAI shim, Google Gemini, DeepSeek, Kimi, Qwen, Mistral, Groq, Together, Cerebras, OpenRouter, Fireworks, LMStudio, vLLM, llama.cpp server, …).

§Errors

Returns an error if the HTTP client fails to build.

Source

pub fn with_embed_dimensions(self, dims: Option<u32>) -> Self

#1598 (fleet follow-up) — builder for the requested embedding output dimensionality (see the embed_dimensions field doc). None clears the request (model-native dim).

Source

pub fn new_with_url(base_url: &str, model: &str) -> Result<Self>

Creates a new OllamaClient with a custom base URL. Checks that Ollama is reachable before returning.

v0.7.0 F6: the underlying reqwest client now carries an explicit connect_timeout so a dead endpoint fails in [CONNECT_TIMEOUT] instead of hanging on the kernel SYN retry budget. The per-request timeout is preserved at [GENERATE_TIMEOUT].

PERF-9 (v0.7.0 FX-C1, 2026-05-26). Sync wrapper around Self::new_with_url_async via the block_on_local helper. Callers already on a tokio runtime should prefer the async constructor directly.

PERF-12 (FX-C4-batch2, 2026-05-26). This constructor still performs the /api/tags Ollama health probe at construction time, preserving the v0.6.x fail-fast posture for callers that depend on construction-time validation (e.g. CLI commands). Boot-fast daemon paths that want to defer reachability verification to first-use should use Self::new_with_url_no_health_check instead.

Source

pub async fn new_with_url_async(base_url: &str, model: &str) -> Result<Self>

PERF-9 (v0.7.0 FX-C1) — async constructor variant. Builds the async reqwest::Client and probes /api/tags (Ollama health) without blocking the calling thread. Callers inside a tokio runtime (HTTP handler, daemon path, MCP stdio loop once it adopts a tokio bridge) should call this directly.

Source

pub fn new_with_url_no_health_check(base_url: &str, model: &str) -> Result<Self>

PERF-12 (FX-C4-batch2, 2026-05-26) — construct an OllamaClient WITHOUT the synchronous /api/tags health check.

Boot-fast variant for daemon paths that want to defer reachability verification to first-use (or to the ai-memory doctor reachability sweep). Saves the 50-200 ms round-trip to a remote LLM endpoint on every serve boot and on every ai-memory mcp dispatch. The circuit-breaker at Self::generate still handles transient failures the usual way, so a degraded LLM endpoint is contained at first use rather than at construction.

Use Self::new_with_url when caller-side construction- time validation is required (e.g. CLI commands that fail fast on bring-up).

Source

pub fn is_available(&self) -> bool

Quick health check — returns true if the backend responds 2xx.

  • Ollama: GET /api/tags (lists pulled models)
  • OpenAI-compatible: GET /v1/models with Bearer auth (most vendors support this endpoint)

Strict semantics: 4xx and 5xx return false. A vendor that returns 401 on bad auth is treated as “not available” because we cannot use it. The circuit-breaker in Self::generate handles transient 5xx burst behavior separately. Matches the pre-#1067 contract pinned by wiremock_tests::test_is_available_returns_false_on_500_response.

PERF-9 (v0.7.0 FX-C1) — sync wrapper around Self::is_available_async. The async variant should be preferred by every callsite already on a tokio runtime.

Source

pub async fn is_available_async(&self) -> bool

PERF-9 (v0.7.0 FX-C1) — async variant of Self::is_available. Same semantics; no thread blocked.

Source

pub fn ensure_model(&self) -> Result<()>

Ensure the configured model is available.

  • Ollama: lists /api/tags, pulls via /api/pull if missing.
  • OpenAI-compatible: no-op — model availability is the vendor’s concern (operator is responsible for confirming the model exists on the chosen vendor’s plan).

PERF-9 (v0.7.0 FX-C1) — sync wrapper around Self::ensure_model_async.

Source

pub async fn ensure_model_async(&self) -> Result<()>

PERF-9 (v0.7.0 FX-C1) — async variant of Self::ensure_model.

§Errors

Returns an error if the /api/tags listing fails, the response JSON cannot be parsed, the pull-client cannot be built, or the pull request fails.

Source

pub fn generate(&self, prompt: &str, system: Option<&str>) -> Result<String>

Generates a completion using the /api/chat endpoint (Ollama chat format). This is compatible with both Ollama and vMLX/OpenAI-compatible servers. Returns the response text.

v0.7.0 F6 — the call is guarded by a circuit breaker. After [CIRCUIT_BREAKER_THRESHOLD] consecutive failures the call fast-fails for [CIRCUIT_BREAKER_COOLDOWN] instead of waiting the full HTTP timeout each time. This is the key defence against the Round-2 F6 deadlock where a dead ollama caused every chat-backed MCP tool to hang the daemon for 30s+.

PERF-9 (v0.7.0 FX-C1, 2026-05-26) — sync wrapper around Self::generate_async. Callers already inside a tokio runtime (HTTP handlers, the daemon path) should prefer the async variant directly to skip the bridge overhead.

Source

pub async fn generate_async( &self, prompt: &str, system: Option<&str>, ) -> Result<String>

PERF-9 (v0.7.0 FX-C1) — async variant of Self::generate. Same circuit-breaker semantics; same wire shape; same error branches. Use this from any caller already inside a tokio runtime to avoid the block_on_local bridge.

§Errors

Returns an error when the circuit breaker is open, the governance NetworkRequest gate refuses the outbound, the HTTP send fails, the response is non-2xx, the response body is not valid JSON, or the JSON is missing the expected message.content (Ollama) / choices[0].message.content (OpenAI-compatible) field.

Source

pub fn expand_query(&self, query: &str) -> Result<Vec<String>>

Uses the LLM to expand a search query into additional search terms.

Source

pub async fn expand_query_async(&self, query: &str) -> Result<Vec<String>>

PERF-9 (v0.7.0 FX-C1) — async variant of Self::expand_query.

§Errors

Propagates any error from the underlying Self::generate_async call (circuit-breaker open, governance refusal, HTTP failure, malformed response, etc.).

Source

pub fn summarize_memories( &self, memories: &[(String, String)], ) -> Result<String>

Takes (title, content) pairs and returns a consolidated summary.

Source

pub async fn summarize_memories_async( &self, memories: &[(String, String)], ) -> Result<String>

PERF-9 (v0.7.0 FX-C1) — async variant of Self::summarize_memories.

§Errors

Propagates any error from the underlying Self::generate_async call.

Source

pub fn auto_tag( &self, title: &str, content: &str, model_override: Option<&str>, ) -> Result<Vec<String>>

Generate up to 8 lowercase semantic tags for a memory.

model_override (L15): when Some, uses that model instead of self.model. Auto_tag is a short structured-output task; using gemma3:4b (12 tokens avg) is dramatically faster than Gemma 4 with its 400+ token thinking output. See bench data in docs/plan-c-cert.md.

num_predict is hard-capped at 64 tokens regardless of model — defense in depth against unbounded chain-of-thought emissions on any model.

Source

pub async fn auto_tag_async( &self, title: &str, content: &str, model_override: Option<&str>, ) -> Result<Vec<String>>

PERF-9 (v0.7.0 FX-C1) — async variant of Self::auto_tag.

§Errors

Propagates any error from the underlying Self::generate_with_model_override_async call.

Source

pub async fn generate_with_model_override_async( &self, prompt: &str, system: Option<&str>, model_override: Option<&str>, ) -> Result<String>

PERF-9 (v0.7.0 FX-C1) — async variant of Self::generate_with_model_override. Same wire shape, same breaker semantics; no thread blocked.

§Errors

Same as Self::generate_async.

Source

pub fn embed_text(&self, text: &str, embed_model: &str) -> Result<Vec<f32>>

Generate an embedding vector via Ollama’s /api/embed endpoint.

Used for nomic-embed-text-v1.5 on smart/autonomous tiers.

v0.7.0 F6 — like OllamaClient::generate, this call is guarded by the same circuit breaker so a dead ollama endpoint doesn’t block every store/recall path on a per-call timeout.

Source

pub async fn embed_text_async( &self, text: &str, embed_model: &str, ) -> Result<Vec<f32>>

PERF-9 (v0.7.0 FX-C1) — async variant of Self::embed_text. Production callers (HTTP handlers, daemon) should prefer this over the sync wrapper.

§Errors

Returns an error when the circuit breaker is open, the governance gate refuses the outbound, the HTTP send fails, the response is non-2xx, the body is not valid JSON, the expected embeddings[0] (Ollama) / data[0].embedding (OpenAI-compatible) field is missing, or the parsed embedding vector is empty.

Source

pub fn embed_texts( &self, texts: &[&str], embed_model: &str, ) -> Result<Vec<Vec<f32>>>

#1603 — generate embeddings for MANY texts, batching the wire where the provider supports it. Sync wrapper over Self::embed_texts_async (same block_on_local discipline as Self::embed_text).

§Errors

Propagates the first per-request error (see Self::embed_texts_async).

Source

pub async fn embed_texts_async( &self, texts: &[&str], embed_model: &str, ) -> Result<Vec<Vec<f32>>>

#1603 — async batched embed. Provider behaviour:

  • OpenAI-compatible — the /embeddings wire shape natively accepts "input": [array of strings], so the inputs are sent in sub-batches of at most [EMBED_BATCH_MAX_INPUTS] texts / [EMBED_BATCH_MAX_BYTES] total bytes per request (one POST per sub-batch instead of one POST per text — the pre-#1603 per-row loop drained an API-backed backfill at ~20 rows/min). On a batch-level error the sub-batch falls back to per-text requests so one rejected input (e.g. an over-context row the vendor 4xxes) cannot poison its whole sub-batch — the same isolation posture as the #1595 backfill fallback.
  • Ollama (native) — per-text loop preserved verbatim: the batched /api/embed wire shape differs across the pinned Ollama versions (the PERF-5 deferral), so batching is staged behind the OpenAI-compatible arm only.

Output order matches input order. The OpenAI-compatible parse honours the response data[*].index field when present (providers may reorder) and falls back to positional order.

§Errors

Returns an error when the circuit breaker is open, the governance gate refuses the outbound, a request fails after the per-text fallback, the response shape is missing data[*].embedding, or the vector count does not match the input count.

Source

pub fn ensure_embed_model(&self, model: &str) -> Result<()>

Ensure an embedding model is available.

  • Ollama: lists /api/tags, pulls via /api/pull if missing.
  • OpenAI-compatible: no-op — vendor-side concern (operator confirms model availability on their plan).
Source

pub async fn ensure_embed_model_async(&self, model: &str) -> Result<()>

PERF-9 (v0.7.0 FX-C1) — async variant of Self::ensure_embed_model.

§Errors

Returns an error if the /api/tags listing fails, the JSON parse fails, the pull client cannot be built, or the /api/pull request fails (network or non-2xx response).

Source

pub fn detect_contradiction(&self, mem_a: &str, mem_b: &str) -> Result<bool>

Returns true if two memory contents contradict each other.

Source

pub async fn detect_contradiction_async( &self, mem_a: &str, mem_b: &str, ) -> Result<bool>

PERF-9 (v0.7.0 FX-C1) — async variant of Self::detect_contradiction.

§Errors

Propagates any error from the underlying Self::generate_async call.

Trait Implementations§

Source§

impl AutonomyLlm for OllamaClient

Source§

fn auto_tag(&self, title: &str, content: &str) -> Result<Vec<String>>

Generate tags for a memory.
Source§

fn detect_contradiction(&self, mem_a: &str, mem_b: &str) -> Result<bool>

Return true iff the two pieces of content contradict each other.
Source§

fn summarize_memories(&self, memories: &[(String, String)]) -> Result<String>

Produce a consolidated summary of N memories.
Source§

impl LlmGenerate for OllamaClient

Source§

fn generate( &self, prompt: &str, system: Option<&str>, ) -> Result<String, CuratorError>

Run a single generate cycle. Returns the response body verbatim (no trimming, no fence-stripping — parse_response handles that).

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> ErasedDestructor for T
where T: 'static,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Sized + Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Sized + Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more