Crate inference_runtime_litellm

Expand description

§inference-runtime-litellm

Thin LiteLLM-proxy adapter on top of inference-runtime-openai. LiteLLM exposes an OpenAI-compatible HTTP surface fronting any backend (OpenAI, Anthropic, Bedrock, Azure, …) and applies its own caching / fallback / retry policies. Doc §10.3.

The LiteLlmRunner is a newtype around OpenAiRunner that:

points at the LiteLLM proxy URL instead of api.openai.com,
lowers max_retries (LiteLLM does its own retries; we want fast fail-through),
preserves runtime_kind() == LiteLlm so dashboards and routing can distinguish “via LiteLLM” from “direct to OpenAI”.

Structs§

LiteLlmConfig
LiteLlmRunner: Newtype wrapper. Delegates to the inner OpenAiRunner for all ModelRunner ops; only runtime_kind and transport_kind differ so observability can distinguish LiteLLM from direct OpenAI.

Enums§

SecretRef

Crate inference_runtime_litellm

Crate inference_runtime_litellm Copy item path

§inference-runtime-litellm

Structs§

Enums§

Crate inference_runtime_litellm