Expand description
§Engine Protocols
This module contains the protocols in public API for the LLM Engine and AsyncEngine facades.
The core components are the CompletionRequest
and StreamingCompletionResponse
objects.
The StreamingCompletionResponse
objects are the outputs of the LLM Engine; however, we
need some additional information to propagate intermediate results for improved observability.
The metadata is transferred via the other arms of the StreamingResponse
enum.
Modules§
Structs§
- Chat
Completion Logprobs - Chat
Completion Token Logprob - Chat
Context - ChatContext is a struct that contains the role and context of a chat message along with a flattened CompletionContext.
- Chat
Turn - ChatTurn is a struct that contains the user and assistant messages in a chat.
- Completion
Context - Defines the prompt template and system prompt for a completion request. If the model does not support prompt templates, the system_prompt will be ignored.
- Completion
Request - TensorRT LLM does not perform preprocessing or postprocessing. The input_ids / token_ids are expected to be preprocessed by the client. The client is responsible for constructing the model specific prompt template and applying the tokenizer.
- Completion
Request Builder - Builder for
CompletionRequest
. - Delta
- Guided
Decoding Options - Guided Decoding Options
- Output
Options - Collection of options that control what information the inference engine returns in the response.
- Sampling
Options - Collection of options that control the sampling behavior of the inference engine.
- Sequence
Position Data - At each SequencePosition we hold position specific data
- Stop
Conditions - TensorRT LLM server-side stop conditions. These options allow for the server to evaluate the generated sequence and stop generation if the sequence meets a stop condition.
- Streaming
Completion Response - TopLogprob
- Usage
Enums§
- Completion
Request Builder Error - Error type for CompletionRequestBuilder
- Finish
Reason - LogProbs
- Logits
- Prompt
Type - LLM Inference Engines can accept a variety of input types. Not all Engines will support all
input types. For example, the trtllm::AsyncEngine only supports
PromptType::Tokens
as an input type. The higher-levelBackend
class is a general wrapper around Engines that will enable many of the input options that require pre/postprocessing. - Stream
State
Constants§
- FREQUENCY_
PENALTY_ RANGE - Frequency Penalty range for sampling.
- TEMPERATURE_
RANGE - Temperature range for sampling.
- TOP_
P_ RANGE - Top P range for sampling.
Traits§
- Output
Options Provider - Sampling
Options Provider - SamplingOptionsProvider is a trait that allows the caller to extract the sampling options from the object that implements it. This will mutate the object.
- Stop
Conditions Provider