Module backend

Module backend 

Source
Expand description

Backend

An Backend is the final stage of the pipeline. It represents the execution of the LLM on some processing hardware.

At minimum, the Backend is split into two components, the Backend itself and a downstream ExecutionContext.

The ExecutionContext can be thought of as the core driver of the forward pass, whereas the Backend is the manager of all resources and concurrent tasks surrounding the LLM execution context / forward pass.

For almost every known scenario, detokenization and initial post processing must happen in the Backend. Further post-processing can happen in the response stream. One example is the jailing mechanism for partial hidden stop condition matches, which can be handled in the response stream rather than the backend.

Structs§

Backend
Backend handles resource management and orchestrates LLM execution
Decoder
The Decoder object could be a member of either the internal LLM engine or part of the postprocessor. If in the postprocessor, should be minimally in the same process or at very minimum on the same physical machine connected by an IPC.
SeqResult
Result of processing a sequence of tokens
StepResult

Enums§

StopTrigger

Type Aliases§

ExecutionContext
Context for executing LLM inference, engine consumes backend input and produces execution output stream
ExecutionOutputStream
Represents the output stream from the execution engine