Skip to main content

Crate inference_runtime_litellm

Crate inference_runtime_litellm 

Source
Expand description

§inference-runtime-litellm

Thin LiteLLM-proxy adapter on top of inference-runtime-openai. LiteLLM exposes an OpenAI-compatible HTTP surface fronting any backend (OpenAI, Anthropic, Bedrock, Azure, …) and applies its own caching / fallback / retry policies. Doc §10.3.

The LiteLlmRunner is a newtype around OpenAiRunner that:

  • points at the LiteLLM proxy URL instead of api.openai.com,
  • lowers max_retries (LiteLLM does its own retries; we want fast fail-through),
  • preserves runtime_kind() == LiteLlm so dashboards and routing can distinguish “via LiteLLM” from “direct to OpenAI”.

Structs§

LiteLlmConfig
LiteLlmRunner
Newtype wrapper. Delegates to the inner OpenAiRunner for all ModelRunner ops; only runtime_kind and transport_kind differ so observability can distinguish LiteLLM from direct OpenAI.

Enums§

SecretRef