pub async fn tensor_response_stream(
state: Arc<State>,
request: NvCreateTensorRequest,
streaming: bool,
) -> Result<impl Stream<Item = Annotated<NvCreateTensorResponse>>, Status>Expand description
Tensor Request Handler
This method will handle the incoming request for model type tensor. The endpoint is a “source”
for an [super::OpenAICompletionsStreamingEngine] and will return a stream of
responses which will be forward to the client.
Note: For all requests, streaming or non-streaming, we always call the engine with streaming enabled. For non-streaming requests, we will fold the stream into a single response as part of this handler.