Module common

Expand description

§Engine Protocols

This module contains the protocols in public API for the LLM Engine and AsyncEngine facades.

The core components are the CompletionRequest and StreamingCompletionResponse objects.

The StreamingCompletionResponse objects are the outputs of the LLM Engine; however, we need some additional information to propagate intermediate results for improved observability. The metadata is transferred via the other arms of the StreamingResponse enum.

Modules§

llm_backend
postprocessor
preprocessor

Structs§

ChatCompletionLogprobs
ChatCompletionTokenLogprob
ChatContext: ChatContext is a struct that contains the role and context of a chat message along with a flattened CompletionContext.
ChatTurn: ChatTurn is a struct that contains the user and assistant messages in a chat.
CompletionContext: Defines the prompt template and system prompt for a completion request. If the model does not support prompt templates, the system_prompt will be ignored.
CompletionRequest: TensorRT LLM does not perform preprocessing or postprocessing. The input_ids / token_ids are expected to be preprocessed by the client. The client is responsible for constructing the model specific prompt template and applying the tokenizer.
CompletionRequestBuilder: Builder for CompletionRequest.
Delta
GuidedDecodingOptions: Guided Decoding Options
OutputOptions: Collection of options that control what information the inference engine returns in the response.
SamplingOptions: Collection of options that control the sampling behavior of the inference engine.
SequencePositionData: At each SequencePosition we hold position specific data
StopConditions: TensorRT LLM server-side stop conditions. These options allow for the server to evaluate the generated sequence and stop generation if the sequence meets a stop condition.
StreamingCompletionResponse
TopLogprob
Usage

Enums§

CompletionRequestBuilderError: Error type for CompletionRequestBuilder
FinishReason
LogProbs
Logits
PromptType: LLM Inference Engines can accept a variety of input types. Not all Engines will support all input types. For example, the trtllm::AsyncEngine only supports PromptType::Tokens as an input type. The higher-level Backend class is a general wrapper around Engines that will enable many of the input options that require pre/postprocessing.
StreamState

Constants§

FREQUENCY_PENALTY_RANGE: Frequency Penalty range for sampling.
TEMPERATURE_RANGE: Temperature range for sampling.
TOP_P_RANGE: Top P range for sampling.

Traits§

OutputOptionsProvider
SamplingOptionsProvider: SamplingOptionsProvider is a trait that allows the caller to extract the sampling options from the object that implements it. This will mutate the object.
StopConditionsProvider

Module common

Module common Copy item path

§Engine Protocols

Modules§

Structs§

Enums§

Constants§

Traits§

Module common