Skip to main content

Crate inference_runtime_openai

Crate inference_runtime_openai 

Source
Expand description

§inference-runtime-openai

OpenAI Chat Completions runtime + Azure OpenAI variant. Doc §10.3.

Implements the inference_core::ModelRunner contract over HTTP/2 (via reqwest). SSE chunks are parsed by inference-remote-core::sse and lifted into provider-typed deltas by wire.

Re-exports§

pub use config::OpenAiConfig;
pub use config::OpenAiVariant;
pub use cost::OpenAiPricing;
pub use error::classify_openai_error;
pub use runner::OpenAiRunner;

Modules§

config
OpenAiConfig — parameters for the OpenAI / Azure OpenAI runtime.
cost
Per-million-token pricing for the OpenAI catalog. Values are public list pricing as of the doc snapshot; operators override via inference-cli flags.
error
OpenAI-specific error refinement. Layers on top of the generic classify_http_status from inference-remote-core to recognise provider-specific shapes that should not be retried (content filter refusals, context-length-exceeded).
runner
OpenAiRunnerModelRunner impl for OpenAI Chat Completions / Azure OpenAI.
wire
OpenAI Chat Completions wire types — request envelope + SSE chunk shape. Kept minimal: only what’s needed to round-trip the prompts / deltas / usage info that the actor system needs.