Skip to main content

Module inference

Module inference 

Source
Expand description

Pluggable inference backend trait — issue #651 (RFC pulled forward from v0.8 per operator directive 28860423-d12c-4959-bc8b-8fa9a94a33d9, 2026-05-18).

§Goal

Provide a single trait surface that unifies the substrate’s two inference paths today (embeddings::Embedder for vector embedding, llm::OllamaClient for chat / auto-tag / detect-contradiction) AND provides a forward-compatible hook for the v0.8 GPU / MTP distilled hot-path backend (issues #651 / #654 / Gap #10 of #846).

§Surface

pub trait InferenceBackend: Send + Sync {
    fn embed(&self, text: &str) -> anyhow::Result<Vec<f32>>;
    fn chat(&self, prompt: &str) -> anyhow::Result<String>;
    fn attested_weights(&self) -> Option<AttestedWeights>;
}

§Backends shipped at v0.7.0

  • CpuBackend — wraps the existing CPU pipeline (embeddings::Embedder + llm::OllamaClient). This is what v0.7.0 actually uses on the recall hot-path.
  • GpuBackend — stub returning not implemented. Lands as a trait-conformant placeholder so the v0.8 work (issue #651 Phase 1 — mistralrs or candle in-process GPU backend) can drop in without any caller-side refactor.

§Attested weights (issue #654)

attested_weights() returns the loaded model’s SHA-256 + an optional Ed25519 signature over the weight bytes. The CPU backend implements MVP supply-chain attestation by hashing the on-disk model file at load time; the GPU backend stub returns None. Documentation for the full v0.8 attested weight chain lives at docs/v0.7.0/inference-attestation.md.

§Regression test

cpu_backend_round_trips_embed (in this module) and gpu_backend_returns_not_implemented pin the contract.

Structs§

AttestedWeights
Attested model-weight provenance returned by InferenceBackend::attested_weights. MVP supply-chain attestation per issue #654 — SHA-256 of the on-disk weight file, plus an optional Ed25519 signature attested by the operator key.
CpuBackend
CPU backend — wraps the existing v0.7.0 inference path (embeddings::Embedder + llm::OllamaClient). This is a thin adapter; the underlying types are unchanged.
GpuBackend
GPU backend stub — issue #651 Phase 1 placeholder. Returns not implemented from every call. Lands as a trait-conformant type so the v0.8 GPU/MTP backend (mistralrs or candle in-process) can drop in without a single caller-side refactor.

Traits§

InferenceBackend
The unified inference surface. v0.8 callers will hold an Arc<dyn InferenceBackend> instead of separate embedder + llm handles. At v0.7.0 the recall hot-path still uses the legacy types directly (no callsite churn during the v0.7.0 ship window); the trait is the seam through which the v0.8 GPU/MTP backend will be threaded.

Functions§

compute_attested_weights
Compute the SHA-256 of a model-weight file on disk and assemble an AttestedWeights record. Issue #654 MVP supply-chain attestation.
verify_attested_weights
Verify an in-flight AttestedWeights record against the file at path. Issue #654 MVP gate — call before binding the backend if the operator has pinned a known-good hash.
verify_attested_weights_with_key
Key-injecting core of verify_attested_weights. Production callers use the wrapper (which resolves the operator key from disk/env); tests pass an explicit operator_pubkey so the signature gate can be exercised hermetically without touching the operator key directory.