Skip to main content

Crate inferd_engine

Crate inferd_engine 

Source
Expand description

Backend trait and adapters for inferd.

See ADR 0005 (engine consumed via FFI), ADR 0007 (routing), and docs/ai.internals.explained.md for the architectural framing.

v0.1 ships:

  • mock — deterministic test double, always available.
  • llamacpp — FFI to vendored libllama (gated behind the llamacpp cargo feature; lands in M2a).

Modules§

mock
Deterministic mock backend used by tests and by the daemon’s M1 echo milestone.

Structs§

AcceleratorInfo
Snapshot of the active hardware-acceleration configuration.
BackendCapabilities
Per-backend capability advertisement. The daemon consults this on boot to decide whether v2 multimodal / tool-use requests can be dispatched, and reports the advertised set on the admin status surface so middleware authors can introspect what the running daemon can do without trial-and-error.
EmbedResult
Result of a successful Backend::embed() call.

Enums§

AcceleratorKind
Hardware-acceleration backend the engine adapter is built and running with. Reflects compile-time GGML feature flags. Pure CPU builds (no cuda / metal / vulkan / rocm features) report Cpu. A build with support but where n_gpu_layers == 0 also effectively uses CPU at runtime — see AcceleratorInfo::gpu_layers.
EmbedError
Errors returned by Backend::embed().
GenerateError
Errors returned by Backend::generate() before any tokens have streamed.
TokenEvent
One event in a generation stream.
TokenEventV2
One event in a v2 generation stream — typed-content-block surface per ADR 0015.

Constants§

DEFAULT_V2_MAX_TOKENS
Default max_tokens for v2 requests when the consumer didn’t supply one. Lives here (rather than in inferd-proto) because v2 sampling defaults are backend-specific (per ADR 0015): the proto crate doesn’t pick them, the active backend does.

Traits§

Backend
An inference backend.

Type Aliases§

TokenStream
Stream of TokenEvent values produced by a backend during generation.
TokenStreamV2
Stream of TokenEventV2 values produced by a backend during a v2 generation. Dropping the stream cancels the in-flight generation.