pub struct IKVCacheUpdateLayer { /* private fields */ }Expand description
IKVCacheUpdateLayer
Layer that represents a KVCacheUpdate operation.
The KVCacheUpdate layer is used to cache the key or value tensors for the attention mechanism. K and V use separate KVCacheUpdate layers.
An IKVCacheUpdateLayer has three inputs (cache, update, writeIndices) and one output.
In kLINEAR mode, for each batch element i, the layer copies the update tensor into the cache starting at
position writeIndices[i]. Assuming no out-of-bounds writes occur, the operation for each sequence position
s in [0, sequenceLength) is:
output[i, :, writeIndices[i] + s, :] = update[i, :, s, :]
The output performs in-place updates on the cache tensor, so they must share the same device memory address.
Do not inherit from this class, as doing so will break forward-compatibility of the API and ABI.
Implementations§
Source§impl IKVCacheUpdateLayer
impl IKVCacheUpdateLayer
Sourcepub fn setCacheMode(
self: Pin<&mut IKVCacheUpdateLayer>,
cacheMode: KVCacheMode,
) -> bool
pub fn setCacheMode( self: Pin<&mut IKVCacheUpdateLayer>, cacheMode: KVCacheMode, ) -> bool
Set the mode of the KVCacheUpdate layer.
cacheModeThe mode of the KVCacheUpdate layer. For TensorRT 10.15, onlykLINEARmode is supported.
True if cache mode is set successfully, false otherwise.
Sourcepub fn getCacheMode(self: &IKVCacheUpdateLayer) -> KVCacheMode
pub fn getCacheMode(self: &IKVCacheUpdateLayer) -> KVCacheMode
Get the mode of the KVCacheUpdate layer.
The mode of the KVCacheUpdate layer.
Sourcepub fn setUpdateForm(
self: Pin<&mut IKVCacheUpdateLayer>,
form: AttentionIOForm,
) -> bool
pub fn setUpdateForm( self: Pin<&mut IKVCacheUpdateLayer>, form: AttentionIOForm, ) -> bool
Set the update form.
Default is kPADDED_BHND. When set to kPACKED_NHD, the update tensor shape is [totalTokens, numHeads, dimHead] instead of [batchSize, numHeads, numTokens, dimHead], and the updateLengths tensor must be provided.
formThe update form.
True if the update form is set successfully, false otherwise.
See [getUpdateForm()]
See AttentionIOForm
Sourcepub fn getUpdateForm(self: &IKVCacheUpdateLayer) -> AttentionIOForm
pub fn getUpdateForm(self: &IKVCacheUpdateLayer) -> AttentionIOForm
Sourcepub unsafe fn setUpdateLengths(
self: Pin<&mut IKVCacheUpdateLayer>,
lengths: *mut ITensor,
) -> bool
pub unsafe fn setUpdateLengths( self: Pin<&mut IKVCacheUpdateLayer>, lengths: *mut ITensor, ) -> bool
Set the update lengths tensor.
Only valid when the update form is kPACKED_NHD. Provides a cumulative token counts tensor with shape [batchSize + 1]. The first element should be 0 and the last element equals totalTokens. The number of tokens for batch i is lengths[i + 1] - lengths[i]. Must be set when update form is kPACKED_NHD.
Providing a first element that is not 0 results in undefined behavior.
lengthsA 1D tensor of type kINT32 with shape [batchSize + 1], or nullptr to clear a previously set update lengths tensor.
True if the update lengths tensor is set successfully, false otherwise.
See [getUpdateLengths()]
Sourcepub fn getUpdateLengths(self: &IKVCacheUpdateLayer) -> *mut ITensor
pub fn getUpdateLengths(self: &IKVCacheUpdateLayer) -> *mut ITensor
Get the update lengths tensor.
The update lengths tensor, or nullptr if not set.
See [setUpdateLengths()]