Module service

Source
Expand description

HTTP Service for Nova LLM

The primary purpose of this crate is to service the nova-llm-protocols via OpenAI compatible HTTP endpoints. This component is meant to be a gateway/ingress into the Nova LLM Distributed Runtime.

In order to create a common pattern, the HttpService forwards the incoming OAI Chat Request or OAI Completion Request to the to a model-specific engines. The engines can be attached and detached dynamically using the ModelManager.

Note: All requests, whether the client requests stream=true or stream=false, are propagated downstream as stream=true. This enables use to handle only 1 pattern of request-response in the downstream services. Non-streaming user requests are aggregated by the HttpService and returned as a single response.

TODO(): Add support for model-specific metadata and status. Status will allow us to return a 503 when the model is supposed to be ready, but there is a problem with the model.

The service_v2::HttpService can be further extended to host any axum::Router using the service_v2::HttpServiceConfigBuilder.

Re-exports§

pub use error::ServiceHttpError;
pub use metrics::Metrics;
pub use axum;

Modules§

discovery
error
metrics
service_v2

Structs§

DeploymentState
The DeploymentState is a global state that is shared across all the workers this provides set of known clients to Engines
ModelManager
RouteDoc
Documentation for a route

Attribute Macros§

async_trait