openai-oxide 0.12.0

Idiomatic Rust client for the OpenAI API — 1:1 parity with the official Python SDK
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
<p align="center">
  <img src="docs/logo.png" alt="openai-oxide" width="480">
  <br>
  <p align="center">
    Feature-complete OpenAI client for <strong>Rust</strong>, <strong>Node.js</strong>, and <strong>Python</strong>.<br>Streaming, WebSockets, structured outputs, WASM. Built for agentic workflows.
  </p>
  <p align="center">
    <a href="https://crates.io/crates/openai-oxide"><img src="https://img.shields.io/crates/v/openai-oxide.svg" alt="crates.io"></a>
    <a href="https://www.npmjs.com/package/openai-oxide"><img src="https://img.shields.io/npm/v/openai-oxide.svg" alt="npm"></a>
    <a href="https://pypi.org/project/openai-oxide/"><img src="https://img.shields.io/pypi/v/openai-oxide.svg" alt="PyPI"></a>
    <a href="https://docs.rs/openai-oxide"><img src="https://docs.rs/openai-oxide/badge.svg" alt="docs.rs"></a>
    <a href="https://fortunto2.github.io/openai-oxide/"><img src="https://img.shields.io/badge/docs-mdbook-blue.svg" alt="Guide"></a>
    <a href="https://socket.dev/npm/package/openai-oxide"><img src="https://badge.socket.dev/npm/package/openai-oxide" alt="Socket"></a>
    <a href="https://github.com/fortunto2/openai-oxide/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="MIT"></a>
    <a href="https://github.com/fortunto2/openai-oxide"><img src="https://img.shields.io/github/stars/fortunto2/openai-oxide?style=social" alt="GitHub stars"></a>
  </p>
</p>

`openai-oxide` implements the full [Responses API](https://platform.openai.com/docs/api-reference/responses), [Chat Completions](https://platform.openai.com/docs/api-reference/chat), and 20+ other endpoints with **persistent WebSockets**, **hedged requests**, **early-parsing for function calls**, and **type-safe Structured Outputs**. Types are provided by the standalone [`openai-types`](https://crates.io/crates/openai-types) crate (1100+ types, auto-synced from the Python SDK).

## Why openai-oxide?

Included:

- **Structured Outputs (`parse::<T>()`):** Auto-generates JSON schema from Rust types via `schemars` and deserializes the response in one call. `parse::<MyStruct>()`. Works with Chat and Responses APIs.
- **Stream Helpers:** High-level `ChatStreamEvent` with automatic text/tool-call accumulation, typed `ContentDelta`/`ToolCallDone` events, `get_final_completion()`, and `current_content()` snapshots. No manual chunk stitching.
- **Streaming:** Incremental SSE parser with buffered line extraction and standard anti-buffering headers (`Accept: text/event-stream`, `Cache-Control: no-cache`).
- **WebSocket Mode + Connection Pool:** Persistent `wss://` connection for the [Responses API]https://platform.openai.com/docs/guides/websocket-mode with built-in connection pooling (`WsPool`). OpenAI reports [up to ~40% faster]https://platform.openai.com/docs/guides/websocket-mode end-to-end for 20+ tool call chains. Our preliminary measurements (29-44%, n=5) align with this. The only Rust client that implements this endpoint.
- **Stream FC Early Parse:** Yields function calls the exact moment `arguments.done` is emitted, letting you start executing local tools before the overall response finishes.
- **Hardware-Accelerated JSON (`simd`):** Opt-in AVX2/NEON vector instructions for faster JSON parsing of large payloads (agent histories, complex tool calls).
- **Hedged Requests:** Send redundant requests and cancel the slower ones. Trades extra tokens for lower tail latency (technique from Google's "The Tail at Scale").
- **Webhook Verification:** HMAC-SHA256 signature verification with timestamp tolerance check (rejects stale requests).
- **HTTP Tuning:** gzip, TCP_NODELAY, HTTP/2 keep-alive with adaptive window, connection pooling, all enabled by default.
- **WASM Support:** Compiles to `wasm32-unknown-unknown`. Streaming, JSON request retries, and early-parsing work in Cloudflare Workers and browsers. Limitations: no multipart uploads, no gzip/HTTP/2 (browser handles these), streaming retries are not yet implemented. [Live demo]https://cloudflare-worker-dioxus.nameless-sunset-8f24.workers.dev.
- **Node.js & Python bindings:** Native napi-rs (Node) and PyO3 (Python) bindings as separate packages. Structured outputs via Zod (Node) and Pydantic v2 (Python). On mock benchmarks, the Node bindings show 2-3x faster SDK overhead vs official `openai` npm (p<0.001).

### One Rust core, every platform

The Rust crate is the single source of truth. Bindings for other platforms are thin wrappers:

| Platform | Binding | Status |
|----------|---------|--------|
| **Rust** | native | stable |
| **Node.js / TypeScript** | napi-rs | stable |
| **Python** | PyO3 + maturin | stable |
| **Browser / Edge / Dioxus / Leptos** | WASM (`wasm32-unknown-unknown`) | stable |
| **iOS / macOS** | UniFFI (Swift) | planned |
| **Android** | UniFFI (Kotlin) | planned |

This means the same HTTP tuning, WebSocket pool, streaming parser, and retry logic run everywhere. No reimplementation per language, no behavior drift. When we add a feature to the Rust core, all platforms get it.

The practical consequence: you can embed `openai-oxide` as the AI layer in a cross-platform app. For example, [rust-code](https://github.com/fortunto2/rust-code) uses [sgr-agent](https://github.com/fortunto2/rust-code/tree/master/crates/sgr-agent) (built on openai-oxide) as a TUI coding agent today, and the same crate can be compiled to WASM and run in a browser.

### When SDK speed starts to matter

On today's OpenAI API (200ms-2s per call), SDK overhead is <1% of wall time. But that's changing:

- **Fast inference providers** (Cerebras, Groq, local models) return responses in 10-50ms. At those speeds, SDK overhead (0.1-5ms) becomes 5-30% of wall time.
- **Agent farms** running hundreds of parallel agents create thousands of requests per second. Per-request overhead compounds fast.
- **Structured outputs + function calling** add serialization and schema generation on every call. In Rust, this runs without GC pauses.

The mock benchmarks show the trajectory: oxide's SDK overhead is 2-3x lower than the official JS SDK on small payloads (p<0.001). As APIs get faster, that gap becomes the bottleneck.

### WebSocket Mode for Agent Loops

OpenAI offers a [WebSocket mode](https://platform.openai.com/docs/guides/websocket-mode) for the Responses API at `wss://api.openai.com/v1/responses`. The connection stays open across multiple turns, and the server uses connection-local caching to speed up continuations. Requests are sequential (one in-flight at a time per connection), but each turn benefits from the server keeping the previous response state in memory.

```text
HTTP (warm connection — TLS reused via pool)
Request 1 (ls)   : [HTTP/2 req] -> [Server loads ctx] -> [Generate] -> [Parse] -> [Exec Tool]
Request 2 (cat)  : [HTTP/2 req] -> [Server loads ctx] -> [Generate] -> [Parse] -> [Exec Tool]

WebSocket (persistent connection — server caches context)
Connection       : [WS Upgrade] (once)
Request 1 (ls)   : [Send JSON] -> [Generate] -> [Parse] -> [Exec Tool]
Request 2 (cat)  : [Send JSON] -> [Ctx cached] -> [Generate] -> [Parse] -> [Exec Tool]
```

The speed improvement comes primarily from the **server side** (connection-local caching, reduced continuation overhead), not from saving a few bytes of HTTP/2 framing on the client. OpenAI reports [up to ~40% faster](https://platform.openai.com/docs/guides/websocket-mode) for chains with 20+ tool calls.

Our preliminary measurements (gpt-5.4, warm connections, n=5):
- **Plain text:** 710ms WS vs 1011ms HTTP (29% faster)
- **Multi-turn (2 reqs):** 1425ms vs 2362ms (40% faster)
- **Rapid-fire (5 calls):** 3227ms vs 5807ms (44% faster)

*Preliminary at n=5 — direction matches OpenAI's published numbers.*

WebSocket mode is compatible with Zero Data Retention (ZDR) and `store: false`. Context is cached in-memory only for the lifetime of the connection, with no disk persistence.

Separately, **Stream FC Early Parse** (works on both HTTP and WebSocket) lets you start executing tool calls the moment arguments are complete, before the stream closes, saving additional time in function-calling loops.

---

## Installation

### Rust
```bash
cargo add openai-oxide tokio --features tokio/full
```

### Node.js / TypeScript
```bash
npm install openai-oxide
# or
pnpm add openai-oxide
# or
yarn add openai-oxide
```
Supported platforms: macOS (x64, arm64), Linux (x64, arm64, glibc & musl), Windows (x64).

### Python
```bash
pip install openai-oxide
# or
uv pip install openai-oxide
```

| Package | Registry | Link |
|---------|----------|------|
| `openai-oxide` | crates.io | [crates.io/crates/openai-oxide]https://crates.io/crates/openai-oxide |
| `openai-types` | crates.io | [crates.io/crates/openai-types]https://crates.io/crates/openai-types |
| `openai-oxide` | npm | [npmjs.com/package/openai-oxide]https://www.npmjs.com/package/openai-oxide |
| `openai-oxide` | PyPI | [pypi.org/project/openai-oxide]https://pypi.org/project/openai-oxide/ |
| `openai-oxide-macros` | crates.io | [crates.io/crates/openai-oxide-macros]https://crates.io/crates/openai-oxide-macros |

---

## Quick Start

### Rust

```rust
use openai_oxide::{OpenAI, types::responses::*};

#[tokio::main]
async fn main() -> Result<(), openai_oxide::OpenAIError> {
    let client = OpenAI::from_env()?; // Uses OPENAI_API_KEY

    let response = client.responses().create(
        ResponseCreateRequest::new("gpt-5.4")
            .input("Explain quantum computing in one sentence.")
            .max_output_tokens(100)
    ).await?;

    println!("{}", response.output_text());
    Ok(())
}
```

### Node.js

```javascript
const { Client } = require("openai-oxide");

const client = new Client(); // Uses OPENAI_API_KEY
const text = await client.createText("gpt-5.4-mini", "Hello from Node!");
console.log(text);
```

### Python

```python
import asyncio, json
from openai_oxide import Client

async def main():
    client = Client()  # Uses OPENAI_API_KEY
    res = json.loads(await client.create("gpt-5.4-mini", "Hello from Python!"))
    print(res["text"])

asyncio.run(main())
```

---

## Benchmarks

- **Environment:** macOS (M-series), release mode.
- **Model:** `gpt-5.4` via the official OpenAI API.
- **Protocol:** TLS + HTTP/2 with connection pooling (warm connections).
- **Methodology:** n=5 per run, 3 runs, median of medians. At n=5, differences <15% are within API jitter. Date: 2026-03-24.

### Rust Ecosystem

`openai-oxide` vs [`async-openai`](https://crates.io/crates/async-openai) 0.34 vs [`genai`](https://crates.io/crates/genai) 0.6-beta. All via Responses API (genai uses Chat API, it's a multi-provider adapter).

| Test | `openai-oxide` | `async-openai` | `genai` | Notes |
| :--- | :--- | :--- | :--- | :--- |
| **Plain text** | 1011ms | **960ms** | **835ms** | oxide slower |
| **Structured output** | 1331ms | N/A | **1197ms** | within noise |
| **Function calling** | **1192ms** | 1748ms | **1030ms** | genai fastest |
| **Multi-turn (2 reqs)** | 2362ms | 3275ms | **1641ms** | genai fastest |
| **Streaming TTFT** | **645ms** | 685ms | 670ms | within noise |
| **Parallel 3x** | 1165ms | **1053ms** | **866ms** | oxide slower |

At n=5 with live API calls, no single SDK consistently wins. Differences <15% are API jitter. genai is fastest on plain text (it skips full response deserialization). oxide wins function calling and streaming TTFT. On today's API latencies (800-1100ms), SDK overhead is negligible for all three. The difference grows with faster backends and higher concurrency (see "When SDK speed starts to matter" above).

**Feature comparison:**

| Feature | `openai-oxide` | `async-openai` 0.34 | `genai` 0.6 |
| :--- | :---: | :---: | :---: |
| SSE streaming | yes | yes | yes |
| Stream helpers (typed events) | **yes** | no | no |
| [WebSocket mode]https://platform.openai.com/docs/guides/websocket-mode for Responses API | **yes** | no | no |
| Structured `parse::<T>()` with schema gen | **yes** | no | no |
| WASM (streaming, no multipart) | **yes** | partial (no streaming) | no |
| Node.js / Python bindings | **yes** | no | no |
| Hedged requests | **yes** | no | no |
| Stream FC early parse | **yes** | no | no |
| Webhook verification | yes | yes | no |

*Reproduce: `cd benchmarks/rust-compare && cargo run --release`*

<br>

<!-- BENCH:python:START -->
### Python Ecosystem (`openai-oxide-python` vs `openai`)

Native PyO3 bindings vs `openai` (openai 2.29.0).

| Test | `openai-oxide` | `openai` | Diff | Notes |
| :--- | :--- | :--- | :--- | :--- |
| **Plain text** | **845ms** | 997ms | +15% | |
| **Structured output** | **1367ms** | 1379ms | +1% | within API noise |
| **Function calling** | **1195ms** | 1230ms | +3% | within API noise |
| **Multi-turn (2 reqs)** | **2260ms** | 3089ms | +27% | |
| **Web search** | **3157ms** | 3499ms | +10% | within API noise |
| **Nested structured** | 5377ms | **5339ms** | -1% | within API noise |
| **Agent loop (2-step)** | **4570ms** | 5144ms | +11% | within API noise |
| **Rapid-fire (5 calls)** | **5667ms** | 6136ms | +8% | within API noise |
| **Prompt-cached** | **4425ms** | 5564ms | +20% | |
| **Streaming TTFT** | **626ms** | 638ms | +2% | within API noise |
| **Parallel 3x** | 1184ms | **1090ms** | -9% | within API noise |
| **Hedged (2x race)** | **893ms** | 995ms | +10% | within API noise |

*median of medians, 3x5 iterations (n=5 per measurement). Model: gpt-5.4. Date: 2026-03-24. Not re-measured since — results may have shifted. Differences <15% are within API jitter at this sample size and should not be treated as statistically significant.*

Reproduce: `cd openai-oxide-python && uv run python ../examples/bench_python.py`
<!-- BENCH:python:END -->

---

<!-- BENCH:node:START -->
### Node.js Ecosystem (`openai-oxide` vs `openai`)

Native napi-rs bindings vs official `openai` npm. n=5 per run, 3 runs — differences <15% are within API noise.

| Test | `openai-oxide` | `openai` | Diff | Note |
| :--- | :--- | :--- | :--- | :--- |
| **Plain text** | 1075ms | 1311ms | -18% | |
| **Structured output** | 1370ms | 1765ms | -22% | |
| **Function calling** | 1725ms | 1832ms | -6% | within API noise |
| **Multi-turn (2 reqs)** | 2283ms | 2859ms | -20% | |
| **Rapid-fire (5 calls)** | 6246ms | 6936ms | -10% | within API noise |
| **Streaming TTFT** | 534ms | 580ms | -8% | within API noise |
| **Parallel 3x** | 1937ms | 1991ms | -3% | within API noise |
| **WebSocket hot pair** | 2181ms | N/A || preliminary, needs reproducible script |

*median of medians, 3×5 iterations. Model: gpt-5.4. Date: 2026-03-24. At n=5 with ~200ms API jitter, only >15% differences are meaningful.*

Reproduce: `cd openai-oxide-node && BENCH_ITERATIONS=5 node examples/bench_node.js`
<!-- BENCH:node:END -->

### SDK Overhead (synthetic, Node.js)

The live benchmarks above include network latency and model inference, which adds noise.
To isolate **pure SDK overhead**, we also run a synthetic benchmark with a localhost mock
server (zero network, zero inference). Fixtures are captured from a real coding agent session
(320 messages, 42 tools, 718KB request body).

| Test | `openai-oxide` | `openai` npm | oxide faster | sig |
| :--- | :--- | :--- | :--- | :--- |
| Tiny req → Tiny resp | 172µs | 443µs | **+61%** | *** |
| Tiny req → Structured 5KB | 161µs | 499µs | **+68%** | *** |
| Medium 150KB → Tool call | 1.1ms | 1.7ms | **+37%** | *** |
| Heavy 657KB → Real agent resp | 4.9ms | 6.2ms | **+21%** | *** |
| SSE stream (114 real chunks) | 283µs | 742µs | **+62%** | *** |
| Agent 20x sequential (tiny) | 2.1ms | 5.4ms | **+61%** | *** |
| Agent 10x sequential (heavy) | 51.7ms | 62.2ms | **+17%** | *** |

*50 iterations, 20 warmup, `--expose-gc`, Welch's t-test — all p<0.001.*

Note: the mock server uses HTTP/1.1, so these results measure SDK serialization/parsing overhead, not HTTP/2 multiplexing benefits.

**Where oxide is faster:** everything on mock, 17-68% depending on payload size. SSE streaming 62% faster. Agent loops compound: 20 tiny calls save 3.3ms, 10 heavy calls save 10.5ms.

**When it matters:** today, with 200ms-2s API latency, SDK overhead is <1% of wall time. But with fast inference (Cerebras, Groq, local models at 10-50ms) or agent farms running hundreds of concurrent sessions, these savings add up. The mock benchmarks show the floor: oxide's overhead is consistently 2-3x lower.

Reproduce: `node --expose-gc benchmarks/bench_science.js`

---

## Python Usage

```python
import asyncio
from openai_oxide import Client

async def main():
    client = Client()
    
    # 1. Standard request
    res = await client.create("gpt-5.4", "Hello!")
    print(res["text"])
    
    # 2. Streaming (Async Iterator)
    stream = await client.create_stream("gpt-5.4", "Explain quantum computing...", max_output_tokens=200)
    async for event in stream:
        print(event)

asyncio.run(main())
```

---

## Advanced Features Guide

### WebSocket Mode
The server caches context locally for WebSocket connections, speeding up continuations. Both HTTP and WebSocket reuse TCP+TLS via connection pooling. The speed gain is server-side, not from saving HTTP/2 framing bytes.

```rust
let client = OpenAI::from_env()?;
let mut session = client.ws_session().await?;

// All calls route through the same wss:// connection
let r1 = session.send(
    ResponseCreateRequest::new("gpt-5.4").input("My name is Rustam.").store(true)
).await?;

let r2 = session.send(
    ResponseCreateRequest::new("gpt-5.4").input("What's my name?").previous_response_id(&r1.id)
).await?;

session.close().await?;
```

### Streaming FC Early Parse
Start executing your local functions instantly when the model finishes generating the arguments, rather than waiting for the entire stream to close.

```rust
let mut handle = client.responses().create_stream_fc(request).await?;

while let Some(fc) = handle.recv().await {
    // Fires immediately on `arguments.done`
    let result = execute_tool(&fc.name, &fc.arguments).await;
}
```

### Hedged Requests
Protect your application against random network latency spikes.

```rust
use openai_oxide::hedged_request;
use std::time::Duration;

// Sends 2 identical requests with a 1.5s delay. Returns whichever finishes first.
let response = hedged_request(&client, request, Some(Duration::from_secs(2))).await?;
```

### Parallel Fan-Out
Send 3 concurrent requests; the total wall time is equal to the slowest single request. Uses HTTP/2 multiplexing when connecting to OpenAI (the real API supports HTTP/2), but note that local mock servers typically use HTTP/1.1.

```rust
let (c1, c2, c3) = (client.clone(), client.clone(), client.clone());
let (r1, r2, r3) = tokio::join!(
    async { c1.responses().create(req1).await },
    async { c2.responses().create(req2).await },
    async { c3.responses().create(req3).await },
);
```

---


### `#[openai_tool]` Macro
Auto-generate JSON schemas for your functions.

```rust
use openai_oxide_macros::openai_tool;

#[openai_tool(description = "Get the current weather")]
fn get_weather(location: String, unit: Option<String>) -> String {
    format!("Weather in {location}")
}

// The macro generates `get_weather_tool()` which returns the `serde_json::Value` schema
let tool = get_weather_tool();
```

### Node.js / TypeScript Native Bindings
Native NAPI-RS bindings. Requests and stream events execute in Rust and cross into the V8 event loop with minimal overhead.

```javascript
const { Client } = require("openai-oxide");

(async () => {
  const client = new Client();
  const session = await client.wsSession();
  const res = await session.send("gpt-5.4-mini", "Say hello to Rust from Node!");
  console.log(res);
  await session.close();
})();
```

At the moment, the Node bindings expose Chat Completions, Responses, streaming helpers, and WebSocket sessions. The full API matrix below refers to the Rust core crate.


## Implemented APIs

| API | Method |
|-----|--------|
| **Chat Completions** | `client.chat().completions().create()` / `create_stream()` |
| **Responses** | `client.responses().create()` / `create_stream()` / `create_stream_fc()` |
| **Responses Tools** | Function, WebSearch, FileSearch, CodeInterpreter, ComputerUse, Mcp, ImageGeneration |
| **WebSocket** | `client.ws_session()` — send / send_stream / warmup / close |
| **Hedged** | `hedged_request()` / `hedged_request_n()` / `speculative()` |
| **Embeddings** | `client.embeddings().create()` |
| **Models** | `client.models().list()` / `retrieve()` / `delete()` |
| **Images** | `client.images().generate()` / `edit()` / `create_variation()` |
| **Audio** | `client.audio().transcriptions()` / `translations()` / `speech()` |
| **Files** | `client.files().create()` / `list()` / `retrieve()` / `delete()` / `content()` |
| **Fine-tuning** | `client.fine_tuning().jobs().create()` / `list()` / `cancel()` / `list_events()` |
| **Moderations** | `client.moderations().create()` |
| **Batches** | `client.batches().create()` / `list()` / `retrieve()` / `cancel()` |
| **Uploads** | `client.uploads().create()` / `add_part()` / `cancel()` / `complete()` |
| **Conversations** | `client.conversations()` CRUD + items CRUD |
| **Videos (Sora)** | `client.videos()` create / list / retrieve / delete / content / edit / extend / remix |
| **Pagination** | `list_page()` / `list_auto()` — cursor-based, async stream |
| **Webhooks** | `Webhooks::new(secret).verify()` / `.unwrap()` (feature: `webhooks`) |
| **Assistants** (beta) | Full CRUD + `list_page()` / `list_auto()` |
| **Threads** (beta) | `client.beta().threads()` CRUD + messages `list_page()` / `list_auto()` |
| **Runs** (beta) | `client.beta().runs(thread_id)` CRUD + steps + `submit_tool_outputs` |
| **Vector Stores** (beta) | `client.beta().vector_stores()` CRUD + `search()` + `list_page()` / `list_auto()` |
| **Realtime** (beta) | `client.beta().realtime().sessions().create()` |

---

## Cargo Features & WASM Optimization

Every endpoint is gated behind a Cargo feature. If you are building for **WebAssembly (WASM)** (e.g., Cloudflare Workers, Dioxus, Leptos), you can significantly **reduce your `.wasm` binary size and compilation time** by disabling default features and only compiling what you need.

```toml
[dependencies]
# Example: Compile ONLY the Responses API (removes Audio, Images, Assistants, etc.)
openai-oxide = { version = "0.11", default-features = false, features = ["responses"] }
```

### Available API Features:
- `chat` — Chat Completions
- `responses` — Responses API (Supports WebSocket)
- `embeddings` — Text Embeddings
- `images` — Image Generation (DALL-E)
- `audio` — TTS and Transcription
- `files` — File management
- `fine-tuning` — Model Fine-tuning
- `models` — Model listing
- `moderations` — Moderation API
- `batches` — Batch API
- `uploads` — Upload API
- `beta` — Assistants, Threads, Vector Stores, Realtime API

### Ecosystem Features:
- `structured` — Enables `parse::<T>()` with auto-generated JSON schema via `schemars`
- `webhooks` — Enables webhook signature verification (HMAC-SHA256)
- `macros` — Enables `#[openai_tool]` proc macro for tool schema generation
- `websocket` — Enables Realtime API over WebSockets (Native: `tokio-tungstenite`)
- `websocket-wasm` — Enables Realtime API over WebSockets (WASM: `gloo-net` / `web-sys`)
- `simd` — Enables `simd-json` for hardware-accelerated JSON deserialization (AVX2/NEON, stable Rust)

Check out our **[Cloudflare Worker Examples](https://github.com/fortunto2/openai-oxide/tree/main/examples/cloudflare-worker-dioxus)** showcasing a Full-Stack Rust app with a Dioxus frontend and a Cloudflare Worker Durable Object backend holding a WebSocket connection to OpenAI.

---

## OpenAI Docs → openai-oxide

OpenAI's official guides apply directly. Here's how each maps to `openai-oxide`:

| OpenAI Guide | Rust | Node.js | Python |
|---|---|---|---|
| [Chat Completions]https://platform.openai.com/docs/guides/chat-completions | `client.chat().completions().create()` | `client.createChatCompletion({...})` | `await client.create(model, input)` |
| [Responses API]https://platform.openai.com/docs/api-reference/responses | `client.responses().create()` | `client.createText(model, input)` | `await client.create(model, input)` |
| [Streaming]https://platform.openai.com/docs/api-reference/streaming | `client.responses().create_stream()` | `client.createStream({...}, cb)` | `await client.create_stream(model, input)` |
| [Function Calling]https://platform.openai.com/docs/guides/function-calling | `client.responses().create_stream_fc()` | `client.createResponse({model, input, tools})` | `await client.create_with_tools(model, input, tools)` |
| [Structured Output]https://platform.openai.com/docs/guides/structured-outputs | `client.chat().completions().parse::<T>()` | `client.createChatParsed(req, name, schema)` | `await client.create_parsed(model, input, PydanticModel)` |
| [Embeddings]https://platform.openai.com/docs/guides/embeddings | `client.embeddings().create()` | via `createResponse()` raw | via `create_raw()` |
| [Image Generation]https://platform.openai.com/docs/guides/images | `client.images().generate()` | via `createResponse()` raw | via `create_raw()` |
| [Text-to-Speech]https://platform.openai.com/docs/guides/text-to-speech | `client.audio().speech().create()` | via `createResponse()` raw | via `create_raw()` |
| [Speech-to-Text]https://platform.openai.com/docs/guides/speech-to-text | `client.audio().transcriptions().create()` | via `createResponse()` raw | via `create_raw()` |
| [Fine-tuning]https://platform.openai.com/docs/guides/fine-tuning | `client.fine_tuning().jobs().create()` | via `createResponse()` raw | via `create_raw()` |
| [Conversations]https://platform.openai.com/docs/guides/conversational-agents | `client.conversations()` CRUD + items | via raw | via raw |
| [Video Generation (Sora)]https://developers.openai.com/api/docs/guides/video-generation | `client.videos()` create/edit/extend/remix | via raw | via raw |
| [Webhooks]https://platform.openai.com/docs/guides/webhooks | `Webhooks::new(secret).verify()` |||
| [Realtime API]https://platform.openai.com/docs/guides/realtime | `client.ws_session()` | `client.wsSession()` ||
| [Assistants]https://platform.openai.com/docs/assistants | `client.beta().assistants()` | via raw | via raw |

> **Tip:** Parameter names match the official Python SDK exactly. If OpenAI docs show `model="gpt-5.4"`, use `.model("gpt-5.4")` in Rust or `{model: "gpt-5.4"}` in Node.js.
>
> **Note:** Node.js and Python bindings have typed helpers for Responses, Chat, Streaming, Function Calling, and Structured Output. All other endpoints are available via the raw JSON methods (`createResponse()` / `create_raw()`) which accept any OpenAI API request body.

---

## Configuration

```rust
use openai_oxide::{OpenAI, config::ClientConfig};
use openai_oxide::azure::AzureConfig;

let client = OpenAI::new("sk-...");                             // Explicit key
let client = OpenAI::with_config(                               // Custom config
    ClientConfig::new("sk-...").base_url("https://...").timeout_secs(30).max_retries(3)
);
let client = OpenAI::azure(AzureConfig::new()                   // Azure OpenAI
    .azure_endpoint("https://my.openai.azure.com").azure_deployment("gpt-4").api_key("...")
)?;
```


## Structured Outputs

Get typed, validated responses directly from the model. No manual JSON parsing.

### Rust (feature: `structured`)

```rust
use openai_oxide::parsing::ParsedChatCompletion;
use schemars::JsonSchema;
use serde::Deserialize;

#[derive(Deserialize, JsonSchema)]
struct MathAnswer {
    steps: Vec<String>,
    final_answer: String,
}

// Chat API
let result: ParsedChatCompletion<MathAnswer> = client.chat().completions()
    .parse::<MathAnswer>(request).await?;
println!("{}", result.parsed.unwrap().final_answer);

// Responses API
let result = client.responses().parse::<MathAnswer>(request).await?;
```

The SDK auto-generates a strict JSON schema from your Rust types, sends it as `response_format` (Chat) or `text.format` (Responses), and deserializes the response. The API guarantees the output matches your schema.

### Node.js

```javascript
// With raw JSON schema
const { parsed } = await client.createChatParsed(request, "MathAnswer", jsonSchema);

// With Zod (optional: npm install zod-to-json-schema)
const { zodParse } = require("openai-oxide/zod");
const Answer = z.object({ steps: z.array(z.string()), final_answer: z.string() });
const { parsed } = await zodParse(client, request, Answer);
```

### Python (Pydantic v2)

```python
from pydantic import BaseModel

class MathAnswer(BaseModel):
    steps: list[str]
    final_answer: str

result = await client.create_parsed("gpt-5.4-mini", "What is 2+2?", MathAnswer)
print(result.final_answer)  # Typed Pydantic instance, not dict
```

---

## Stream Helpers

High-level streaming with typed events and automatic delta accumulation.

```rust
use openai_oxide::stream_helpers::ChatStreamEvent;

// Option 1: Just get the final result
let stream = client.chat().completions().create_stream_helper(request).await?;
let completion = stream.get_final_completion().await?;

// Option 2: React to typed events
let mut stream = client.chat().completions().create_stream_helper(request).await?;
while let Some(event) = stream.next().await {
    match event? {
        ChatStreamEvent::ContentDelta { delta, snapshot } => {
            print!("{delta}");  // Print as it arrives
            // snapshot = full text accumulated so far
        }
        ChatStreamEvent::ToolCallDone { name, arguments, .. } => {
            // Arguments are complete — execute the tool
            execute_tool(&name, &arguments).await;
        }
        ChatStreamEvent::ContentDone { content } => {
            // Final text, fully assembled
        }
        _ => {}
    }
}
```

No manual chunk stitching. Tool call arguments are automatically assembled from index-based deltas.

---

## Webhook Verification

Verify OpenAI webhook signatures (feature: `webhooks`).

```rust
use openai_oxide::resources::webhooks::Webhooks;

let wh = Webhooks::new("whsec_your_secret")?;
let event: MyEvent = wh.unwrap(payload, signature_header, timestamp_header)?;
```

---

## Built With AI

Built in days, not months, using [Claude Code](https://claude.ai/claude-code): pre-commit quality gates, OpenAPI spec as ground truth, official Python SDK as reference. Planning and code intelligence via [solo-factory](https://github.com/fortunto2/solo-factory) skills and [solograph](https://github.com/fortunto2/solograph) MCP server.

---

## Roadmap

Full OpenAI API coverage from a single codebase — Rust core with native bindings for every major platform.

- [x] **Rust Core**: Typed client covering Chat, Responses, Realtime, Assistants, and 20+ endpoints.
- [x] **WASM Support**: Cloudflare Workers and browser execution (with limitations noted above).
- [x] **Python Bindings**: Native PyO3 integration published on PyPI.
- [ ] **Tauri Integrations**: Dedicated examples/guides for building AI desktop apps with Tauri + WebSockets.
- [ ] **HTMX + Axum Examples**: Showcasing how to stream LLM responses directly to HTML with zero-JS frontends.
- [ ] **Swift Bindings (UniFFI)**: Native iOS/macOS integration for Apple ecosystem developers.
- [ ] **Kotlin Bindings (UniFFI)**: Native Android integration via JNI.
- [x] **Node.js/TypeScript Bindings (NAPI-RS)**: Native Node.js bindings for the TS ecosystem.

PRs welcome.

## Keeping up with OpenAI

OpenAI changes their API often. We built an automated sync pipeline to keep up.

Types are strictly validated against the [official OpenAPI spec](https://github.com/openai/openai-openapi) and cross-checked directly with the [official Python SDK](https://github.com/openai/openai-python)'s AST.

```bash
make sync       # downloads latest spec, diffs against local schema, runs coverage
```

`make sync` automatically:
1. Downloads the latest OpenAPI schema from OpenAI.
2. Displays a precise `git diff` of newly added endpoints, struct fields, and enums.
3. Runs the `openapi_coverage` test suite to statically verify our Rust types against the spec.

Coverage is enforced on every commit via pre-commit hooks (currently **96%** — missing `stream` and `partial_images` on Images). Types are auto-synced from the Python SDK via `py2rust.py`, so new OpenAI fields land with a single `make sync-types` run.


## Used In

`openai-oxide` is designed to be an **agent infrastructure crate** — the OpenAI layer that any Rust agent framework can build on. Not tied to a specific agent architecture.

- **[sgr-agent]https://github.com/fortunto2/rust-code/tree/master/crates/sgr-agent** — LLM agent framework with structured output, function calling, and agent loops. Uses `openai-oxide` as the OpenAI backend. The same crate compiles to WASM for browser-based agents.
- **[rust-code]https://github.com/fortunto2/rust-code** — AI-powered TUI coding agent built on sgr-agent.



## AI Agent Skills

This repo includes an [Agent Skill](https://agentskills.io/) — a portable knowledge pack that teaches AI coding assistants how to use `openai-oxide` correctly (gotchas, patterns, API reference).

Works with Claude Code, Cursor, GitHub Copilot, Gemini CLI, VS Code, and [30+ other agents](https://agentskills.io/).

```bash
# Context7
npx ctx7 skills search openai-oxide
npx ctx7 skills install /fortunto2/openai-oxide

# skills.sh
npx skills add fortunto2/openai-oxide
```

---

## See Also

- [openai-python]https://github.com/openai/openai-python — Official Python SDK (our benchmark baseline)
- [async-openai]https://github.com/64bit/async-openai — Alternative Rust client (mature, 1800+ stars)
- [genai]https://github.com/jeremychone/rust-genai — Multi-provider Rust client (Gemini, Anthropic, OpenAI)

## License

[MIT](LICENSE)