Expand description
Backend trait and adapters for inferd.
See ADR 0005 (engine consumed via FFI), ADR 0007 (routing), and
docs/ai.internals.explained.md for the architectural framing.
v0.1 ships:
mock— deterministic test double, always available.llamacpp— FFI to vendoredlibllama(gated behind thellamacppcargo feature; lands in M2a).
Modules§
- mock
- Deterministic mock backend used by tests and by the daemon’s M1 echo milestone.
Structs§
- Accelerator
Info - Snapshot of the active hardware-acceleration configuration.
- Backend
Capabilities - Per-backend capability advertisement. The daemon consults this on boot to decide whether v2 multimodal / tool-use requests can be dispatched, and reports the advertised set on the admin status surface so middleware authors can introspect what the running daemon can do without trial-and-error.
- Embed
Result - Result of a successful
Backend::embed()call.
Enums§
- Accelerator
Kind - Hardware-acceleration backend the engine adapter is built and
running with. Reflects compile-time GGML feature flags. Pure CPU
builds (no
cuda/metal/vulkan/rocmfeatures) reportCpu. A build with support but wheren_gpu_layers == 0also effectively uses CPU at runtime — seeAcceleratorInfo::gpu_layers. - Embed
Error - Errors returned by
Backend::embed(). - Generate
Error - Errors returned by
Backend::generate()before any tokens have streamed. - Token
Event - One event in a generation stream.
- Token
Event V2 - One event in a v2 generation stream — typed-content-block surface per ADR 0015.
Constants§
- DEFAULT_
V2_ MAX_ TOKENS - Default
max_tokensfor v2 requests when the consumer didn’t supply one. Lives here (rather than ininferd-proto) because v2 sampling defaults are backend-specific (per ADR 0015): the proto crate doesn’t pick them, the active backend does.
Traits§
- Backend
- An inference backend.
Type Aliases§
- Token
Stream - Stream of
TokenEventvalues produced by a backend during generation. - Token
Stream V2 - Stream of
TokenEventV2values produced by a backend during a v2 generation. Dropping the stream cancels the in-flight generation.