atomr-infer-runtime-litellm
Thin LiteLLM-proxy adapter on top of
atomr-infer-runtime-openai. ~110 LOC.
LiteLLM exposes an OpenAI-compatible HTTP surface fronting any backend
(OpenAI, Anthropic, Bedrock, Azure, Cohere, …) and applies its own
caching / fallback / retry policies. The LiteLlmRunner is a newtype
around OpenAiRunner that:
- Points at the LiteLLM proxy URL instead of
api.openai.com. - Lowers the default
max_retriesto 1 — LiteLLM does its own retries, so client-side retries would compound. - Preserves
runtime_kind() == LiteLlmandtransport_kind().provider == LiteLlmso dashboards and routing can distinguish "via LiteLLM" from "direct to OpenAI" even when the wire format is identical.
Quick start
use ;
use SecretRef;
let cfg = LiteLlmConfig ;
let openai_cfg = cfg.into_openai;
let runner = new?;
When to choose this over atomr-infer-runtime-openai
- Your team already runs LiteLLM as the central provider gateway and wants observability tagged with the proxy hop.
- You want LiteLLM's fallback chains (Anthropic → Bedrock → OpenAI on failure) and we should stay out of the way.
- You're consolidating spend tracking through the proxy, so
per-deployment cost in the inference-side
MetricsActoris intentionally a downstream concern.