Expand description
§inference-runtime-litellm
Thin LiteLLM-proxy adapter on top of inference-runtime-openai.
LiteLLM exposes an OpenAI-compatible HTTP surface fronting any
backend (OpenAI, Anthropic, Bedrock, Azure, …) and applies its own
caching / fallback / retry policies. Doc §10.3.
The LiteLlmRunner is a newtype around OpenAiRunner that:
- points at the LiteLLM proxy URL instead of
api.openai.com, - lowers
max_retries(LiteLLM does its own retries; we want fast fail-through), - preserves
runtime_kind() == LiteLlmso dashboards and routing can distinguish “via LiteLLM” from “direct to OpenAI”.
Structs§
- Lite
LlmConfig - Lite
LlmRunner - Newtype wrapper. Delegates to the inner
OpenAiRunnerfor allModelRunnerops; onlyruntime_kindandtransport_kinddiffer so observability can distinguish LiteLLM from direct OpenAI.