Expand description
Pluggable inference backend trait — issue #651 (RFC pulled forward
from v0.8 per operator directive 28860423-d12c-4959-bc8b-8fa9a94a33d9,
2026-05-18).
§Goal
Provide a single trait surface that unifies the substrate’s two
inference paths today (embeddings::Embedder for vector embedding,
llm::OllamaClient for chat / auto-tag / detect-contradiction)
AND provides a forward-compatible hook for the v0.8 GPU / MTP
distilled hot-path backend (issues #651 / #654 / Gap #10 of #846).
§Surface
pub trait InferenceBackend: Send + Sync {
fn embed(&self, text: &str) -> anyhow::Result<Vec<f32>>;
fn chat(&self, prompt: &str) -> anyhow::Result<String>;
fn attested_weights(&self) -> Option<AttestedWeights>;
}§Backends shipped at v0.7.0
CpuBackend— wraps the existing CPU pipeline (embeddings::Embedder+llm::OllamaClient). This is what v0.7.0 actually uses on the recall hot-path.GpuBackend— stub returningnot implemented. Lands as a trait-conformant placeholder so the v0.8 work (issue #651 Phase 1 — mistralrs or candle in-process GPU backend) can drop in without any caller-side refactor.
§Attested weights (issue #654)
attested_weights() returns the loaded model’s SHA-256 + an
optional Ed25519 signature over the weight bytes. The CPU backend
implements MVP supply-chain attestation by hashing the on-disk
model file at load time; the GPU backend stub returns None.
Documentation for the full v0.8 attested weight chain lives at
docs/v0.7.0/inference-attestation.md.
§Regression test
cpu_backend_round_trips_embed (in this module) and
gpu_backend_returns_not_implemented pin the contract.
Structs§
- Attested
Weights - Attested model-weight provenance returned by
InferenceBackend::attested_weights. MVP supply-chain attestation per issue #654 — SHA-256 of the on-disk weight file, plus an optional Ed25519 signature attested by the operator key. - CpuBackend
- CPU backend — wraps the existing v0.7.0 inference path
(
embeddings::Embedder+llm::OllamaClient). This is a thin adapter; the underlying types are unchanged. - GpuBackend
- GPU backend stub — issue #651 Phase 1 placeholder. Returns
not implementedfrom every call. Lands as a trait-conformant type so the v0.8 GPU/MTP backend (mistralrs or candle in-process) can drop in without a single caller-side refactor.
Traits§
- Inference
Backend - The unified inference surface. v0.8 callers will hold an
Arc<dyn InferenceBackend>instead of separate embedder + llm handles. At v0.7.0 the recall hot-path still uses the legacy types directly (no callsite churn during the v0.7.0 ship window); the trait is the seam through which the v0.8 GPU/MTP backend will be threaded.
Functions§
- compute_
attested_ weights - Compute the SHA-256 of a model-weight file on disk and assemble an
AttestedWeightsrecord. Issue #654 MVP supply-chain attestation. - verify_
attested_ weights - Verify an in-flight
AttestedWeightsrecord against the file atpath. Issue #654 MVP gate — call before binding the backend if the operator has pinned a known-good hash. - verify_
attested_ weights_ with_ key - Key-injecting core of
verify_attested_weights. Production callers use the wrapper (which resolves the operator key from disk/env); tests pass an explicitoperator_pubkeyso the signature gate can be exercised hermetically without touching the operator key directory.