openai-compat

Async Rust client for the OpenAI API and any OpenAI-compatible LLM provider,
modeled on the official openai-python SDK.
Full API documentation: docs.rs/openai-compat
Features
- Chat completions — full request surface (tools,
response_format /
JSON schema, penalties, logprobs, seed, stop, ...) with typed responses
- Responses API —
create/retrieve/delete/cancel, background mode
with resumable streaming, stateful chaining via previous_response_id, and
input_items pagination
- Streaming — server-sent events exposed as a
futures::Stream of typed
chunks, terminating on [DONE] and surfacing mid-stream errors
- Embeddings, models, moderations, legacy completions, images, files
(multipart upload/download), audio (TTS + transcription)
- Batches, resumable uploads, fine-tuning jobs, vector stores, assistants
(beta v2: threads, messages, runs, run steps)
- Multimodal messages — text, image, and audio content parts
- Webhooks — HMAC-SHA256 signature verification (constant-time, timestamp
tolerance) matching the Python SDK
- Azure OpenAI —
api-version query, api-key/Entra ID auth, and
deployment-based paths via the same client builder
- Realtime API — WebSocket sessions (tokio-tungstenite/rustls) with JSON
events and typed event constructors
- Automatic retries — mirrors the Python SDK: 408/409/429/5xx and
connection errors, exponential backoff with jitter (0.5s → 8s),
Retry-After/retry-after-ms/x-should-retry support, 2 retries by default
- Typed errors — status-specific error kinds with parsed
{message, type, param, code} detail and x-request-id
- Any provider — set
base_url to use any OpenAI-compatible endpoint
- Escape hatch — generic
get/post/delete for endpoints not yet typed
Installation
[dependencies]
openai-compat = "0.2"
tokio = { version = "1", features = ["full"] }
futures-util = "0.3"
Quick start
use openai_compat::{ChatCompletionRequest, Client, Message};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = Client::new()?;
let request = ChatCompletionRequest::new(
"gpt-4o-mini",
vec![
Message::system("You are a helpful assistant."),
Message::user("Hello!"),
],
)
.temperature(0.7);
let completion = client.chat().completions().create(request).await?;
println!("{}", completion.content().unwrap_or_default());
Ok(())
}
Explicit configuration / other providers
use openai_compat::Client;
use std::time::Duration;
# fn main() -> Result<(), openai_compat::OpenAIError> {
let client = Client::builder()
.api_key("sk-...")
.base_url("https://openrouter.ai/api/v1") .timeout(Duration::from_secs(120))
.max_retries(3)
.header("X-Custom", "value")
.build()?;
# Ok(())
# }
Streaming
use futures_util::StreamExt;
use openai_compat::{ChatCompletionRequest, Client, Message};
# async fn run() -> Result<(), Box<dyn std::error::Error>> {
# let client = Client::new()?;
let request = ChatCompletionRequest::new("gpt-4o-mini", vec![Message::user("Hi")]);
let mut stream = client.chat().completions().create_stream(request).await?;
while let Some(chunk) = stream.next().await {
if let Some(content) = chunk?.content() {
print!("{content}");
}
}
# Ok(())
# }
Tool calling
use openai_compat::{ChatCompletionRequest, Client, Message, Tool, ToolChoice};
use serde_json::json;
# async fn run() -> Result<(), Box<dyn std::error::Error>> {
# let client = Client::new()?;
let request = ChatCompletionRequest::new("gpt-4o-mini", vec![Message::user("Weather in Hanoi?")])
.tools(vec![Tool::function(
"get_weather",
"Get current weather for a city",
json!({"type": "object", "properties": {"city": {"type": "string"}}}),
)])
.tool_choice(ToolChoice::Auto);
let completion = client.chat().completions().create(request).await?;
if let Some(calls) = &completion.choices[0].message.tool_calls {
for call in calls {
println!("{} -> {}", call.function.name, call.function.arguments);
}
}
# Ok(())
# }
Responses API
use openai_compat::{Client, CreateResponseRequest};
# async fn run() -> Result<(), Box<dyn std::error::Error>> {
# let client = Client::new()?;
let request = CreateResponseRequest::new("gpt-4o-mini", "Hello!");
let response = client.responses().create(request).await?;
println!("{}", response.output_text());
let follow_up = CreateResponseRequest::new("gpt-4o-mini", "And in French?")
.previous_response_id(response.id);
client.responses().create(follow_up).await?;
# Ok(())
# }
Streaming uses the same EventStream as chat, but yields a tagged union of
~12 typed events (Created, OutputTextDelta, Completed, Failed, ...)
instead of homogeneous delta chunks; unrecognized event types deserialize to
ResponseStreamEvent::Unknown rather than erroring. Failed/Incomplete are
typed Ok variants, not stream errors — inspect the variant to detect them.
See examples/responses.rs for a full streaming example, and
client.responses().input_items(id).list_all(None) for paginating the items
that produced a response.
Other resources
use openai_compat::types::embeddings::EmbeddingRequest;
use openai_compat::types::files::FileUpload;
use openai_compat::types::audio::SpeechRequest;
# async fn run() -> Result<(), Box<dyn std::error::Error>> {
# let client = openai_compat::Client::new()?;
let embeddings = client
.embeddings()
.create(EmbeddingRequest::new("text-embedding-3-small", "hello world"))
.await?;
let models = client.models().list().await?;
let file = client
.files()
.create(FileUpload::from_path("data.jsonl").await?, "fine-tune")
.await?;
let audio = client
.audio()
.speech(SpeechRequest::new("tts-1", "Hello!", "alloy"))
.await?;
# Ok(())
# }
Error handling
use openai_compat::{ApiErrorKind, OpenAIError};
# async fn run(client: openai_compat::Client, req: openai_compat::ChatCompletionRequest) {
match client.chat().completions().create(req).await {
Ok(completion) => println!("{:?}", completion.content()),
Err(OpenAIError::Api(err)) => {
if err.kind == ApiErrorKind::RateLimit {
eprintln!("rate limited: {err}");
}
}
Err(OpenAIError::Timeout) => eprintln!("request timed out"),
Err(other) => eprintln!("{other}"),
}
# }
Multimodal messages
use openai_compat::{ChatCompletionRequest, ContentPart, Message};
# fn build() -> ChatCompletionRequest {
ChatCompletionRequest::new(
"gpt-4o",
vec![Message::user(vec![
ContentPart::text("What is in this image?"),
ContentPart::image_url("https://example.com/photo.png"),
])],
)
# }
Batches, fine-tuning, vector stores, assistants
use openai_compat::types::batches::BatchCreateParams;
use openai_compat::types::fine_tuning::FineTuningJobRequest;
# async fn run(client: openai_compat::Client) -> Result<(), openai_compat::OpenAIError> {
let batch = client
.batches()
.create(BatchCreateParams::new("file-abc", "/v1/chat/completions", "24h"))
.await?;
let job = client
.fine_tuning()
.jobs()
.create(FineTuningJobRequest::new("gpt-4o-mini-2024-07-18", "file-train"))
.await?;
let stores = client.vector_stores().list(None).await?;
let assistants = client.assistants().list(None).await?; # Ok(())
# }
Webhooks
use openai_compat::webhooks::{Webhooks, WebhookHeaders};
# fn verify(payload: &[u8], headers: &WebhookHeaders) -> bool {
let webhooks = Webhooks::new(&std::env::var("OPENAI_WEBHOOK_SECRET").unwrap()).unwrap();
webhooks.unwrap(payload, headers).is_ok() # }
Azure OpenAI
# fn main() -> Result<(), openai_compat::OpenAIError> {
let client = openai_compat::Client::builder()
.azure("https://my-resource.openai.azure.com", "2024-06-01")
.azure_deployment("my-gpt4o") .build()?; # Ok(())
# }
Realtime
use openai_compat::realtime::events;
# async fn run(client: openai_compat::Client) -> Result<(), Box<dyn std::error::Error>> {
let mut session = client.connect_realtime("gpt-4o-realtime-preview").await?;
session.send(events::response_create()).await?;
while let Some(event) = session.recv().await? {
println!("{}", event["type"]);
}
# Ok(())
# }
Examples
OPENAI_API_KEY=sk-... cargo run --example chat
OPENAI_API_KEY=sk-... cargo run --example chat-streaming
OPENAI_API_KEY=sk-... cargo run --example tool-calling
OPENAI_API_KEY=sk-... cargo run --example responses
Scope
v0.2 ports the full core client surface of openai-python: chat (incl.
multimodal content parts), the Responses API (incl. streaming, background
mode, input_items pagination), embeddings, models, moderations, legacy
completions, images (generate), files, audio (speech/transcriptions), batches,
resumable uploads, fine-tuning jobs, vector stores, assistants (beta v2),
webhook signature verification, Azure OpenAI mode, realtime WebSockets,
retries, streaming, and cursor pagination.
Deliberately simplified: assistants streaming runs and the fully-typed
realtime event surface are not modeled (events are serde_json::Value with
typed constructors); deep polymorphic fields (graders, chunking filters,
step details) are serde_json::Value. Responses API v1 covers
create/retrieve/delete/cancel/streaming/input_items, with the built-in
tools beyond web_search/file_search/code_interpreter, the compact()
and input_tokens.count() endpoints, the parse() structured-output
wrapper, and the Responses-over-WebSocket connection left as future work. For
untyped endpoints use the client.get::<serde_json::Value>(...) /
client.post(...) escape hatch.
License
Apache-2.0