Expand description
HTTP Service for Nova LLM
The primary purpose of this crate is to service the nova-llm-protocols via OpenAI compatible HTTP endpoints. This component is meant to be a gateway/ingress into the Nova LLM Distributed Runtime.
In order to create a common pattern, the HttpService forwards the incoming OAI Chat Request or OAI Completion Request to the
to a model-specific engines. The engines can be attached and detached dynamically using the ModelManager.
Note: All requests, whether the client requests stream=true or stream=false, are propagated downstream as stream=true.
This enables use to handle only 1 pattern of request-response in the downstream services. Non-streaming user requests are
aggregated by the HttpService and returned as a single response.
TODO(): Add support for model-specific metadata and status. Status will allow us to return a 503 when the model is supposed to be ready, but there is a problem with the model.
The service_v2::HttpService can be further extended to host any axum::Router using the service_v2::HttpServiceConfigBuilder.
Re-exports§
pub use error::ServiceHttpError;pub use metrics::Metrics;pub use axum;
Modules§
Structs§
- Deployment
State - The DeploymentState is a global state that is shared across all the workers this provides set of known clients to Engines
- Model
Manager - Route
Doc - Documentation for a route