Module common

Module common 

Source
Expand description

§Engine Protocols

This module contains the protocols in public API for the LLM Engine and AsyncEngine facades.

The core components are the CompletionRequest and StreamingCompletionResponse objects.

The StreamingCompletionResponse objects are the outputs of the LLM Engine; however, we need some additional information to propagate intermediate results for improved observability. The metadata is transferred via the other arms of the StreamingResponse enum.

Modules§

llm_backend
postprocessor
preprocessor

Structs§

ChatCompletionLogprobs
ChatCompletionTokenLogprob
ChatContext
ChatContext is a struct that contains the role and context of a chat message along with a flattened CompletionContext.
ChatTurn
ChatTurn is a struct that contains the user and assistant messages in a chat.
CompletionContext
Defines the prompt template and system prompt for a completion request. If the model does not support prompt templates, the system_prompt will be ignored.
CompletionRequest
TensorRT LLM does not perform preprocessing or postprocessing. The input_ids / token_ids are expected to be preprocessed by the client. The client is responsible for constructing the model specific prompt template and applying the tokenizer.
CompletionRequestBuilder
Builder for CompletionRequest.
Delta
GuidedDecodingOptions
Guided Decoding Options
OutputOptions
Collection of options that control what information the inference engine returns in the response.
SamplingOptions
Collection of options that control the sampling behavior of the inference engine.
SequencePositionData
At each SequencePosition we hold position specific data
StopConditions
TensorRT LLM server-side stop conditions. These options allow for the server to evaluate the generated sequence and stop generation if the sequence meets a stop condition.
StreamingCompletionResponse
TopLogprob
Usage

Enums§

CompletionRequestBuilderError
Error type for CompletionRequestBuilder
FinishReason
LogProbs
Logits
PromptType
LLM Inference Engines can accept a variety of input types. Not all Engines will support all input types. For example, the trtllm::AsyncEngine only supports PromptType::Tokens as an input type. The higher-level Backend class is a general wrapper around Engines that will enable many of the input options that require pre/postprocessing.
StreamState

Constants§

FREQUENCY_PENALTY_RANGE
Frequency Penalty range for sampling.
TEMPERATURE_RANGE
Temperature range for sampling.
TOP_P_RANGE
Top P range for sampling.

Traits§

OutputOptionsProvider
SamplingOptionsProvider
SamplingOptionsProvider is a trait that allows the caller to extract the sampling options from the object that implements it. This will mutate the object.
StopConditionsProvider