tensor_response_stream

Function tensor_response_stream 

Source
pub async fn tensor_response_stream(
    state: Arc<State>,
    request: NvCreateTensorRequest,
    streaming: bool,
) -> Result<impl Stream<Item = Annotated<NvCreateTensorResponse>>, Status>
Expand description

Tensor Request Handler

This method will handle the incoming request for model type tensor. The endpoint is a “source” for an [super::OpenAICompletionsStreamingEngine] and will return a stream of responses which will be forward to the client.

Note: For all requests, streaming or non-streaming, we always call the engine with streaming enabled. For non-streaming requests, we will fold the stream into a single response as part of this handler.