Expand description
Backend
An Backend
is the final stage of the pipeline. It represents the execution of the LLM
on some processing hardware.
At minimum, the Backend is split into two components, the Backend
itself and a downstream ExecutionContext
.
The ExecutionContext
can be thought of as the core driver of the forward pass, whereas the Backend
is the
manager of all resources and concurrent tasks surrounding the LLM execution context / forward pass.
For almost every known scenario, detokenization and initial post processing must happen in the Backend. Further post-processing can happen in the response stream. One example is the jailing mechanism for partial hidden stop condition matches, which can be handled in the response stream rather than the backend.
Structs§
- Backend
- Backend handles resource management and orchestrates LLM execution
- Decoder
- The
Decoder
object could be a member of either the internal LLM engine or part of the postprocessor. If in the postprocessor, should be minimally in the same process or at very minimum on the same physical machine connected by an IPC. - SeqResult
- Result of processing a sequence of tokens
- Step
Result
Enums§
Type Aliases§
- Execution
Context - Context for executing LLM inference, engine consumes backend input and produces execution output stream
- Execution
Output Stream - Represents the output stream from the execution engine