openai-oxide implements the full Responses API, Chat Completions, and 20+ other endpoints. It introduces performance primitives like persistent WebSockets, hedged requests, early-parsing for function calls, and type-safe Structured Outputs — features previously unavailable in the Rust ecosystem.
Why openai-oxide?
We built openai-oxide to squeeze every millisecond out of the OpenAI API.
- Structured Outputs —
parse::<T>(): Auto-generates JSON schema from Rust types viaschemarsand deserializes the response in one call —parse::<MyStruct>(). Works with Chat and Responses APIs. Node (Zod) and Python (Pydantic v2) bindings included. - Stream Helpers: High-level
ChatStreamEventwith automatic text/tool-call accumulation, typedContentDelta/ToolCallDoneevents,get_final_completion(), andcurrent_content()snapshots. No manual chunk stitching. - Zero-Overhead Streaming: Custom zero-copy SSE parser with strict
Accept: text/event-streamandCache-Control: no-cacheheaders to prevent reverse-proxy buffering — TTFT in ~580ms. - WebSocket Mode: Persistent
wss://connection for the Responses API. Bypasses per-request TLS handshakes, reducing multi-turn agent loop latency by up to 37%. - Stream FC Early Parse: Yields function calls the exact moment
arguments.doneis emitted, letting you execute local tools ~400ms before the overall response finishes. - Hardware-Accelerated JSON (
simd): Opt-in AVX2/NEON vector instructions for parsing massive agent histories and complex tool calls in microseconds. - Hedged Requests: Send redundant requests and cancel the slower ones. Costs 2-7% extra tokens but reliably reduces P99 tail latency by 50-96% (inspired by Google's "The Tail at Scale").
- Webhook Verification: HMAC-SHA256 signature verification with timestamp replay protection — production-ready webhook handling out of the box.
- HTTP Tuning: gzip, TCP_NODELAY, HTTP/2 keep-alive with adaptive window, connection pooling — enabled by default. Neither async-openai nor genai set these.
- WASM First-Class: Compiles to
wasm32-unknown-unknownwithout dropping features. Streaming, retries, and early-parsing work flawlessly in Cloudflare Workers and browsers. Live demo.
The Agentic Multiplier Effect
In complex agent loops (e.g. coding agents, researchers) where a model calls dozens of tools sequentially, standard SDKs introduce compounding delays. openai-oxide collapses this latency through architectural pipelining:
- Persistent Connections: Standard SDKs perform a full HTTP round-trip (TCP/TLS handshake + headers) for every step. With
openai-oxide's WebSocket mode, the connection stays hot. You save ~300ms per tool call. Over 50 tool calls, that's 15 seconds of pure network overhead eliminated. - Asynchronous Execution: Standard SDKs wait for the
[DONE]signal from OpenAI before parsing the response and yielding the tool call to your code.openai-oxideparses the SSE stream on the fly. The moment{"type": "response.function_call.arguments.done"}arrives, your local function (e.g.lsorcat) starts executing while OpenAI is still generating the final metadata. - Strict Typings: Unlike wrappers that treat tool arguments as raw dynamic
Values,openai-oxideenforces strict typings. If OpenAI hallucinates invalid JSON structure, it is caught at the SDK boundary, allowing the agent to immediately self-correct without crashing the application.
Standard Client (HTTP/REST)
Request 1 (ls) : [TLS Handshake] -> [Req] -> [Wait TTFT] -> [Wait Done] -> [Parse JSON] -> [Exec Tool]
Request 2 (cat) : [TLS Handshake] -> [Req] -> [Wait TTFT] -> [Wait Done] -> [Parse JSON] -> [Exec Tool]
openai-oxide (WebSockets + Early Parse)
Connection : [TLS Handshake] (Done once)
Request 1 (ls) : [Req] -> [Wait TTFT] -> [Exec Tool Early!]
Request 2 (cat) : [Req] -> [Wait TTFT] -> [Exec Tool Early!]
Result: An agent performing 10 tool calls completes its task up to 50% faster.
Installation
Rust
Node.js / TypeScript
# or
# or
Supported platforms: macOS (x64, arm64), Linux (x64, arm64, glibc & musl), Windows (x64).
Python
# or
| Package | Registry | Link |
|---|---|---|
openai-oxide |
crates.io | crates.io/crates/openai-oxide |
openai-oxide |
npm | npmjs.com/package/openai-oxide |
openai-oxide |
PyPI | pypi.org/project/openai-oxide |
openai-oxide-macros |
crates.io | crates.io/crates/openai-oxide-macros |
Quick Start
Rust
use ;
async
Node.js
const = require;
const client = ; // Uses OPENAI_API_KEY
const text = await client.;
console.log;
Python
= # Uses OPENAI_API_KEY
=
Benchmarks
All benchmarks were run to ensure a fair, real-world comparison of the clients:
- Environment: macOS (M-series), native compilation.
- Model:
gpt-5.4via the official OpenAI API. - Protocol: TLS + HTTP/2 multiplexing with connection pooling (warm connections).
- Execution: 5 iterations per test. The reported value is the Median time.
- Rust APIs:
openai-oxideprovides first-class support for both the traditionalChat Completions API(/v1/chat/completions) and the newerResponses API(/v1/responses). The Responses API has slightly higher backend orchestration latency on OpenAI's side for non-streamed requests, so we separate them for fairness.
Rust Ecosystem (openai-oxide vs async-openai vs genai)
| Test | openai-oxide(WebSockets) |
openai-oxide(Responses API) |
async-openai(Responses API) |
genai(Responses API) |
openai-oxide(Chat API) |
genai(Chat API) |
|---|---|---|---|---|---|---|
| Plain text | 710ms ( -29% ) | 1000ms | 960ms | 835ms | 753ms | 722ms |
| Structured output | ~1000ms | 1352ms | N/A | 1197ms | 1304ms | N/A |
| Function calling | ~850ms | 1164ms | 1748ms | 1030ms | 1252ms | N/A |
| Streaming TTFT | ~400ms | 670ms | 685ms | 670ms | 695ms | N/A |
| Multi-turn (2 reqs) | 1425ms ( -35% ) | 2219ms | 3275ms | 1641ms | 2011ms | 1560ms |
| Rapid-fire (5 calls) | 3227ms ( -37% ) | 5147ms | 5166ms | 3807ms | 4671ms | 3540ms |
| Parallel 3x (fan-out) | N/A ( Sync ) | 1081ms | 1053ms | 866ms | 978ms | 801ms |
Reproduce: cargo run --example benchmark --features responses --release
Understanding the Results
1. Why is genai sometimes slightly faster in HTTP Plain Text?
genai is designed as a universal, loosely-typed adapter. When OpenAI sends a 3KB JSON response, genai only extracts the raw text (value["output"][0]["content"][0]["text"]) and drops the rest.
openai-oxide is a full SDK. We rigorously deserialize and validate the entire response tree into strict Rust structs—including token usage, logprobs, finish reasons, and tool metadata. This guarantees type safety and gives you full access to the API, at the cost of ~100-150ms of CPU deserialization time.
2. Where openai-oxide destroys the competition:
- Streaming (TTFT): Our custom zero-copy SSE parser bypasses
serde_jsonoverhead, matching the theoretical network limit (~670ms). - Function Calling: Because
async-openaiandgenaiaren't hyper-optimized for the complex nested schemas of OpenAI's tool calls, our strict deserialization engine actually overtakes them by a massive margin (1164ms vs 1748ms). - WebSockets: By holding the TCP/TLS connection open, our WebSocket mode bypasses HTTP overhead entirely, making
openai-oxidesignificantly faster than any HTTP-only client (710ms).
Python Ecosystem (openai-oxide-python vs openai)
openai-oxide wins 10/12 tests. Native PyO3 bindings vs openai (openai 2.29.0).
| Test | openai-oxide |
openai |
Winner |
|---|---|---|---|
| Plain text | 845ms | 997ms | OXIDE (+15%) |
| Structured output | 1367ms | 1379ms | OXIDE (+1%) |
| Function calling | 1195ms | 1230ms | OXIDE (+3%) |
| Multi-turn (2 reqs) | 2260ms | 3089ms | OXIDE (+27%) |
| Web search | 3157ms | 3499ms | OXIDE (+10%) |
| Nested structured | 5377ms | 5339ms | python (+1%) |
| Agent loop (2-step) | 4570ms | 5144ms | OXIDE (+11%) |
| Rapid-fire (5 calls) | 5667ms | 6136ms | OXIDE (+8%) |
| Prompt-cached | 4425ms | 5564ms | OXIDE (+20%) |
| Streaming TTFT | 626ms | 638ms | OXIDE (+2%) |
| Parallel 3x | 1184ms | 1090ms | python (+9%) |
| Hedged (2x race) | 893ms | 995ms | OXIDE (+10%) |
median of medians, 3×5 iterations. Model: gpt-5.4.
Reproduce: cd openai-oxide-python && uv run python ../examples/bench_python.py
Node.js Ecosystem (openai-oxide vs openai)
openai-oxide wins 8/8 tests. Native napi-rs bindings vs official openai npm.
| Test | openai-oxide |
openai |
Winner |
|---|---|---|---|
| Plain text | 1075ms | 1311ms | OXIDE (+18%) |
| Structured output | 1370ms | 1765ms | OXIDE (+22%) |
| Function calling | 1725ms | 1832ms | OXIDE (+6%) |
| Multi-turn (2 reqs) | 2283ms | 2859ms | OXIDE (+20%) |
| Rapid-fire (5 calls) | 6246ms | 6936ms | OXIDE (+10%) |
| Streaming TTFT | 534ms | 580ms | OXIDE (+8%) |
| Parallel 3x | 1937ms | 1991ms | OXIDE (+3%) |
| WebSocket hot pair | 2181ms | N/A | OXIDE |
median of medians, 3×5 iterations. Model: gpt-5.4.
Reproduce: cd openai-oxide-node && BENCH_ITERATIONS=5 node examples/bench_node.js
Python Usage
=
# 1. Standard request
= await
# 2. Streaming (Async Iterator)
= await
Advanced Features Guide
WebSocket Mode
Persistent connections bypass the TLS handshake penalty for every request. Ideal for high-speed agent loops.
let client = from_env?;
let mut session = client.ws_session.await?;
// All calls route through the same wss:// connection
let r1 = session.send.await?;
let r2 = session.send.await?;
session.close.await?;
Streaming FC Early Parse
Start executing your local functions instantly when the model finishes generating the arguments, rather than waiting for the entire stream to close.
let mut handle = client.responses.create_stream_fc.await?;
while let Some = handle.recv.await
Hedged Requests
Protect your application against random network latency spikes.
use hedged_request;
use Duration;
// Sends 2 identical requests with a 1.5s delay. Returns whichever finishes first.
let response = hedged_request.await?;
Parallel Fan-Out
Leverage HTTP/2 multiplexing natively. Send 3 concurrent requests over a single connection; the total wall time is equal to the slowest single request.
let = ;
let = join!;
#[openai_tool] Macro
Auto-generate JSON schemas for your functions.
use openai_tool;
// The macro generates `get_weather_tool()` which returns the `serde_json::Value` schema
let tool = get_weather_tool;
Node.js / TypeScript Native Bindings
Thanks to NAPI-RS, we now provide lightning-fast Node.js bindings that execute requests and stream events directly from Rust into the V8 event loop without pure-JS blocking overhead.
const = require;
;
At the moment, the Node bindings expose Chat Completions, Responses, streaming helpers, and WebSocket sessions. The full API matrix below refers to the Rust core crate.
Implemented APIs
| API | Method |
|---|---|
| Chat Completions | client.chat().completions().create() / create_stream() |
| Responses | client.responses().create() / create_stream() / create_stream_fc() |
| Responses Tools | Function, WebSearch, FileSearch, CodeInterpreter, ComputerUse, Mcp, ImageGeneration |
| WebSocket | client.ws_session() — send / send_stream / warmup / close |
| Hedged | hedged_request() / hedged_request_n() / speculative() |
| Embeddings | client.embeddings().create() |
| Models | client.models().list() / retrieve() / delete() |
| Images | client.images().generate() / edit() / create_variation() |
| Audio | client.audio().transcriptions() / translations() / speech() |
| Files | client.files().create() / list() / retrieve() / delete() / content() |
| Fine-tuning | client.fine_tuning().jobs().create() / list() / cancel() / list_events() |
| Moderations | client.moderations().create() |
| Batches | client.batches().create() / list() / retrieve() / cancel() |
| Uploads | client.uploads().create() / cancel() / complete() |
| Pagination | list_page() / list_auto() — cursor-based, async stream |
| Assistants (beta) | Full CRUD + threads + runs + vector stores |
| Realtime (beta) | client.beta().realtime().sessions().create() |
Cargo Features & WASM Optimization
Every endpoint is gated behind a Cargo feature. If you are building for WebAssembly (WASM) (e.g., Cloudflare Workers, Dioxus, Leptos), you can significantly reduce your .wasm binary size and compilation time by disabling default features and only compiling what you need.
[]
# Example: Compile ONLY the Responses API (removes Audio, Images, Assistants, etc.)
= { = "0.9", = false, = ["responses"] }
Available API Features:
chat— Chat Completionsresponses— Responses API (Supports WebSocket)embeddings— Text Embeddingsimages— Image Generation (DALL-E)audio— TTS and Transcriptionfiles— File managementfine-tuning— Model Fine-tuningmodels— Model listingmoderations— Moderation APIbatches— Batch APIuploads— Upload APIbeta— Assistants, Threads, Vector Stores, Realtime API
Ecosystem Features:
websocket— Enables Realtime API over WebSockets (Native:tokio-tungstenite)websocket-wasm— Enables Realtime API over WebSockets (WASM:gloo-net/web-sys)simd— Enablessimd-jsonfor ultra-fast JSON deserialization (requires nightly Rust)
Check out our Cloudflare Worker Examples showcasing a Full-Stack Rust app with a Dioxus frontend and a Cloudflare Worker Durable Object backend holding a WebSocket connection to OpenAI.
OpenAI Docs → openai-oxide
Use OpenAI's official guides — the same concepts apply directly. Here's how each maps to openai-oxide:
| OpenAI Guide | Rust | Node.js | Python |
|---|---|---|---|
| Chat Completions | client.chat().completions().create() |
client.createChatCompletion({...}) |
await client.create(model, input) |
| Responses API | client.responses().create() |
client.createText(model, input) |
await client.create(model, input) |
| Streaming | client.responses().create_stream() |
client.createStream({...}, cb) |
await client.create_stream(model, input) |
| Function Calling | client.responses().create_stream_fc() |
client.createResponse({model, input, tools}) |
await client.create_with_tools(model, input, tools) |
| Structured Output | client.chat().completions().parse::<T>() |
client.createChatParsed(req, name, schema) |
await client.create_parsed(model, input, PydanticModel) |
| Embeddings | client.embeddings().create() |
via createResponse() raw |
via create_raw() |
| Image Generation | client.images().generate() |
via createResponse() raw |
via create_raw() |
| Text-to-Speech | client.audio().speech().create() |
via createResponse() raw |
via create_raw() |
| Speech-to-Text | client.audio().transcriptions().create() |
via createResponse() raw |
via create_raw() |
| Fine-tuning | client.fine_tuning().jobs().create() |
via createResponse() raw |
via create_raw() |
| Conversations | client.conversations() CRUD + items |
via raw | via raw |
| Video Generation (Sora) | client.videos() create/edit/extend/remix |
via raw | via raw |
| Webhooks | Webhooks::new(secret).verify() |
— | — |
| Realtime API | client.ws_session() |
client.wsSession() |
— |
| Assistants | client.beta().assistants() |
via raw | via raw |
Tip: Parameter names match the official Python SDK exactly. If OpenAI docs show
model="gpt-5.4", use.model("gpt-5.4")in Rust or{model: "gpt-5.4"}in Node.js.Note: Node.js and Python bindings have typed helpers for Responses, Chat, Streaming, Function Calling, and Structured Output. All other endpoints are available via the raw JSON methods (
createResponse()/create_raw()) which accept any OpenAI API request body.
Configuration
use ;
use AzureConfig;
let client = new; // Explicit key
let client = with_config;
let client = azure?;
Structured Outputs
Get typed, validated responses directly from the model — no manual JSON parsing.
Rust (feature: structured)
use ParsedChatCompletion;
use JsonSchema;
use Deserialize;
// Chat API
let result: = client.chat.completions
..await?;
println!;
// Responses API
let result = client.responses..await?;
The SDK auto-generates a strict JSON schema from your Rust types, sends it as response_format (Chat) or text.format (Responses), and deserializes the response. The API guarantees the output matches your schema.
Node.js
// With raw JSON schema
const = await client.;
// With Zod (optional: npm install zod-to-json-schema)
const = require;
const Answer = z.;
const = await ;
Python (Pydantic v2)
:
:
= await
# Typed Pydantic instance, not dict
Stream Helpers
High-level streaming with typed events and automatic delta accumulation.
use ChatStreamEvent;
// Option 1: Just get the final result
let stream = client.chat.completions.create_stream_helper.await?;
let completion = stream.get_final_completion.await?;
// Option 2: React to typed events
let mut stream = client.chat.completions.create_stream_helper.await?;
while let Some = stream.next.await
No manual chunk stitching. Tool call arguments are automatically assembled from index-based deltas.
Webhook Verification
Verify OpenAI webhook signatures (feature: webhooks).
use Webhooks;
let wh = new?;
let event: MyEvent = wh.unwrap?;
Built With AI
This crate was built in days, not months — using Claude Code with a harness engineering approach: pre-commit quality gates, OpenAPI spec as ground truth, official Python SDK as reference. Planning and code intelligence via solo-factory skills and solograph MCP server.
Roadmap
Our goal is to make openai-oxide the universal engine for all LLM integrations across the entire software stack.
- Rust Core: Fully typed, high-performance client (Chat, Responses, Realtime, Assistants).
- WASM Support: First-class Cloudflare Workers & browser execution.
- Python Bindings: Native PyO3 integration published on PyPI.
- Tauri Integrations: Dedicated examples/guides for building AI desktop apps with Tauri + WebSockets.
- HTMX + Axum Examples: Showcasing how to stream LLM responses directly to HTML with zero-JS frontends.
- Swift Bindings (UniFFI): Native iOS/macOS integration for Apple ecosystem developers.
- Kotlin Bindings (UniFFI): Native Android integration via JNI.
- Node.js/TypeScript Bindings (NAPI-RS): Native Node.js bindings for the TS ecosystem.
Want to help us get there? PRs and discussions are highly welcome!
Keeping up with OpenAI
OpenAI moves fast. To ensure openai-oxide never falls behind, we built an automated architecture synchronization pipeline.
Types are strictly validated against the official OpenAPI spec and cross-checked directly with the official Python SDK's AST.
make sync automatically:
- Downloads the latest OpenAPI schema from OpenAI.
- Displays a precise
git diffof newly added endpoints, struct fields, and enums. - Runs the
openapi_coveragetest suite to statically verify our Rust types against the spec.
Coverage is enforced on every commit via pre-commit hooks. Current field coverage for all implemented typed schemas is 100%. This guarantees 1:1 feature parity with the Python SDK, ensuring you can adopt new OpenAI models and features on day one.
Used In
- sgr-agent — LLM agent framework with structured output, function calling, and agent loops.
openai-oxideis the default backend. - rust-code — AI-powered TUI coding agent.
AI Agent Skills
This repo includes an Agent Skill — a portable knowledge pack that teaches AI coding assistants how to use openai-oxide correctly (gotchas, patterns, API reference).
Works with Claude Code, Cursor, GitHub Copilot, Gemini CLI, VS Code, and 30+ other agents.
# Context7
# skills.sh
See Also
- openai-python — Official Python SDK (our benchmark baseline)
- async-openai — Alternative Rust client (mature, 1800+ stars)
- genai — Multi-provider Rust client (Gemini, Anthropic, OpenAI)