codex-asr 0.1.2

Unofficial Codex Desktop ASR client that reuses local ChatGPT auth for one-shot transcription
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
<p align="center">
  <img src="./logo.png" width="220" alt="codex-asr logo">
</p>

<h1 align="center">codex-asr</h1>

<p align="center">
  Reuse your local Codex Desktop ChatGPT login for one-shot ASR.
</p>

<p align="center">
  <a href="https://github.com/Wangnov/codex-asr/releases/latest"><img src="https://img.shields.io/github/v/release/Wangnov/codex-asr?logo=github" alt="Latest release"></a>
  <a href="https://crates.io/crates/codex-asr"><img src="https://img.shields.io/crates/v/codex-asr?logo=rust" alt="Crates.io"></a>
  <a href="https://docs.rs/codex-asr"><img src="https://img.shields.io/docsrs/codex-asr?logo=docs.rs" alt="docs.rs"></a>
  <a href="https://github.com/Wangnov/codex-asr/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/Wangnov/codex-asr/ci.yml?branch=main&label=ci" alt="CI"></a>
  <a href="https://github.com/Wangnov/codex-asr/actions/workflows/docker.yml"><img src="https://img.shields.io/github/actions/workflow/status/Wangnov/codex-asr/docker.yml?branch=main&label=docker" alt="Docker"></a>
  <a href="https://github.com/Wangnov/codex-asr/actions/workflows/release.yml"><img src="https://img.shields.io/badge/release-cargo--dist-2ea44f" alt="cargo-dist release automation"></a>
  <a href="https://github.com/Wangnov/codex-asr/blob/main/Cargo.toml"><img src="https://img.shields.io/badge/license-MIT-2ea44f" alt="MIT license"></a>
</p>

<p align="center">
  <a href="#readme-cn">中文</a> · <a href="#readme-en">English</a>
</p>

<p align="center">
  crates.io · cargo-binstall · Homebrew · Docker / GHCR · Rust library · local REST
</p>

---

<a id="readme-cn"></a>

# 中文

Codex.app 里有一个一次性语音输入接口:它把录音上传到
`https://chatgpt.com/backend-api/transcribe`,再拿回文本。这个接口不是
OpenAI Whisper API,也不是公开稳定 API,但它可以复用你已经登录在本机
Codex Desktop / ChatGPT 里的账号。

`codex-asr` 把这件事拆成一个 Rust CLI、一个可发布的 crate,以及一个本地
OpenAI Whisper 风格 REST 包装层。你可以直接转写音频文件,也可以让现有
OpenAI SDK 指向本机 `codex-asr serve`。

## 适合谁用

- 你已经在 Codex Desktop 里登录了 ChatGPT 账号
- 你想在脚本、CLI、Agent 或本地服务里复用 Codex App 的 ASR 能力
- 你只想提供最小输入面,例如一个本地音频文件,或显式传入一个 bearer token
- 你想把 `.silk` / 微信语音先经本机 `rust-silk` 解码,再上传转写
- 你接受这是逆向自 Codex Desktop 行为的本地自动化工具,而不是官方 API

## 安装

### Homebrew

```bash
brew tap wangnov/tap
brew install codex-asr
```

### cargo-binstall

```bash
cargo binstall codex-asr
```

### cargo install

```bash
cargo install codex-asr
```

### GitHub Release 安装脚本

```bash
curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/Wangnov/codex-asr/releases/latest/download/codex-asr-installer.sh \
  | sh
```

PowerShell:

```powershell
powershell -ExecutionPolicy ByPass -c "irm https://github.com/Wangnov/codex-asr/releases/latest/download/codex-asr-installer.ps1 | iex"
```

### 直接下载二进制

从 [最新 GitHub Release](https://github.com/Wangnov/codex-asr/releases/latest)
下载对应平台的压缩包或安装脚本。

### Docker / GHCR

```bash
docker pull ghcr.io/wangnov/codex-asr:latest
docker run --rm ghcr.io/wangnov/codex-asr:latest --version
```

`main` 分支会通过 GitHub Actions 发布 `latest`、`main` 和 `sha-*`
标签;正式 release tag 会额外发布 `vX.Y.Z`、`X.Y.Z` 和 `X.Y` 标签。
Docker 镜像默认内置 `rust-silk`,路径是 `/usr/local/bin/rust-silk`。

### 从当前源码安装

```bash
cargo install --path .
```

## 快速上手

```bash
# 直接转写,认证默认读取 $CODEX_HOME/auth.json 或 ~/.codex/auth.json
codex-asr audio.wav

# 给上游 ASR 一个语言提示
codex-asr audio.wav --language zh

# JSON 输出,方便脚本消费
codex-asr audio.wav --json

# 扩展名缺失时显式告诉 multipart content type
codex-asr raw-audio --content-type audio/wav

# 微信 / SILK 语音先通过外部 rust-silk CLI 解码成临时 WAV
codex-asr voice.silk --silk-decoder /path/to/rust-silk
```

## 本地 REST / OpenAI SDK

启动一个本地 Whisper 风格 REST 端点:

```bash
codex-asr serve --api-key local_dev_key --host 127.0.0.1 --port 8788 --concurrency 16
```

然后用 OpenAI 风格 multipart 请求调用:

```bash
curl http://127.0.0.1:8788/v1/audio/transcriptions \
  -H 'Authorization: Bearer local_dev_key' \
  -F model=whisper-1 \
  -F file=@audio.wav
```

OpenAI SDK 示例在 `examples/`:

```bash
python3 -m pip install openai
CODEX_ASR_SERVER_KEY=local_dev_key \
  python3 examples/python_openai_sdk.py audio.wav

npm install openai
CODEX_ASR_SERVER_KEY=local_dev_key \
  node examples/node_openai_sdk.mjs audio.wav
```

REST 路由:

| Route | 说明 |
| --- | --- |
| `GET /healthz` | 健康检查,不需要认证 |
| `POST /v1/audio/transcriptions` | OpenAI Whisper 风格路由 |
| `POST /audio/transcriptions` | 短别名 |

REST 字段兼容范围:

| Field | 处理方式 |
| --- | --- |
| `file` | 必填 |
| `model` | 为 SDK 兼容而接受,忽略 |
| `language` | 转发给 Codex `/transcribe` |
| `response_format` | 支持 `json``text``verbose_json` |
| `prompt``temperature``timestamp_granularities` | 为 SDK 兼容而接受,忽略 |

`srt` 和 `vtt` 会返回 HTTP 400,因为 Codex 这个端点不返回时间戳。

## Docker 公网部署

这个部署形态只承诺三件事:公网端口、HTTPS、api-key。`deploy/compose.public.yml`
用 Caddy 自动申请证书,把 80/443 暴露到公网,再反代到容器里的
`codex-asr serve`。应用层鉴权使用 `Authorization: Bearer <CODEX_ASR_SERVER_KEY>`。

前置条件:

- 域名 A/AAAA 记录已经指向服务器
- 服务器公网防火墙放行 80 和 443
- 服务器上已有可用的 Codex / ChatGPT auth 文件
- 已准备一个容器用户可读的 auth 副本

```bash
sudo install -d -m 750 /opt/codex-asr
sudo install -o 10001 -g 65534 -m 0400 ~/.codex/auth.json /opt/codex-asr/auth.json
cp deploy/env.public.example deploy/.env
$EDITOR deploy/.env
docker compose --env-file deploy/.env -f deploy/compose.public.yml up -d
```

`deploy/.env` 里至少要设置:

```bash
CODEX_ASR_DOMAIN=asr.example.com
CODEX_ASR_AUTH_FILE=/opt/codex-asr/auth.json
CODEX_ASR_SERVER_KEY=replace-with-a-long-random-api-key
CODEX_ASR_MAX_UPLOAD_MB=50
CODEX_ASR_REQUEST_TIMEOUT_SECONDS=300
```

调用示例:

```bash
curl https://asr.example.com/healthz

curl https://asr.example.com/v1/audio/transcriptions \
  -H "Authorization: Bearer replace-with-a-long-random-api-key" \
  -F model=whisper-1 \
  -F file=@audio.wav
```

Docker 镜像默认内置 `rust-silk`,所以 `.silk` / 微信语音会在容器内先解码成
临时 WAV 再上传。只有要替换成自定义解码器时,才需要覆盖
`CODEX_ASR_SILK_DECODER`。

## 命令说明

- `codex-asr <audio>`:默认等同于 `codex-asr transcribe <audio>`
- `transcribe`:上传一个音频文件并返回文本
- `serve`:启动本地 OpenAI Whisper 风格 REST 包装层

常用选项:

| Option | 说明 |
| --- | --- |
| `--bearer` | 显式传入 ChatGPT bearer token;默认读取本机 Codex auth |
| `--account-id` | 覆盖 `ChatGPT-Account-Id`;通常会从 bearer token 自动解析 |
| `--auth-file` | 指定 Codex auth 文件 |
| `--endpoint` | 覆盖上游 `/backend-api/transcribe` URL |
| `--proxy` | 指定 HTTPS 代理 |
| `--request-timeout-seconds` | 上游请求总超时,默认 `300` |
| `--connect-timeout-seconds` | 上游连接超时,默认 `15` |
| `--language` | 语言提示,例如 `zh``en` |
| `--content-type` | 覆盖音频 content type |
| `--filename` | 覆盖 multipart 文件名 |
| `--json` | 输出 `{"text":"..."}` |
| `--silk-decoder` | 指定外部 `rust-silk` CLI |
| `--silk-sample-rate` | SILK 解码成 WAV 时的采样率,默认 `24000` |
| `--no-silk-decode` | 直接上传 `.silk`,不做本地解码 |

`serve` 额外支持:

| Option | 说明 |
| --- | --- |
| `--host` | 绑定地址,默认 `127.0.0.1` |
| `--port` | 绑定端口,默认 `8788` |
| `--api-key` | 本地 REST API key |
| `--no-api-key` | 关闭本地 REST 认证,仅限可信 loopback |
| `--concurrency` | 上游 ASR 并发上限,默认 `16` |
| `--max-upload-mb` | REST 上传体积上限,默认 `50` MiB |

## 认证模型

默认情况下,`codex-asr` 会读取:

1. `$CODEX_HOME/auth.json`
2. `~/.codex/auth.json`

如果你想把输入面压到最小,可以只传 bearer:

```bash
codex-asr audio.wav --bearer "$TOKEN"
CODEX_ASR_BEARER="$TOKEN" codex-asr audio.wav
```

`ChatGPT-Account-Id` 通常会从 bearer token 的 JWT payload 中自动解析。只有当
token 里没有这个 claim 时,才需要手动传:

```bash
codex-asr audio.wav --bearer "$TOKEN" --account-id acct_...
```

可用环境变量:

| Env | 用途 |
| --- | --- |
| `CODEX_ASR_BEARER` | 默认 bearer token |
| `CODEX_ASR_SERVER_KEY` | 本地 REST API key |
| `CODEX_ASR_PROXY` | HTTPS 代理 |
| `CODEX_ASR_REQUEST_TIMEOUT_SECONDS` | 上游请求总超时,默认 `300` |
| `CODEX_ASR_CONNECT_TIMEOUT_SECONDS` | 上游连接超时,默认 `15` |
| `CODEX_ASR_MAX_UPLOAD_MB` | REST 上传体积上限,默认 `50` MiB |
| `CODEX_ASR_SILK_DECODER` | 外部 `rust-silk` CLI 路径;Docker 镜像默认 `/usr/local/bin/rust-silk` |

## 它会改什么

- 读取本机 Codex / ChatGPT auth 文件,但不会写入或刷新它
- `serve` 会在每次转写前重新读取 auth 文件,以便容器中的 auth 副本更新后无需重启服务
- 上传你指定的音频文件到 `https://chatgpt.com/backend-api/transcribe`
- `.silk` / `.slk` 输入默认先在本地解码成临时 WAV,再上传 WAV
- REST server 只做本地包装、字段兼容、并发限制和响应格式转换
- 不会保存转写结果,除非你的调用方自己保存

## 安全边界

`codex-asr` 会复用你的个人 Codex / ChatGPT 登录 token。不要裸露 HTTP 或无
api-key 的 REST server;公网部署至少使用 HTTPS 和应用层 api-key。

- REST 默认绑定 `127.0.0.1`
- 本地 REST key 只是本地访问控制,不是 OpenAI 或 ChatGPT token
- 只有在可信本机回环环境里才使用 `--no-api-key`
- `deploy/compose.public.yml` 的公网形态默认启用 Caddy HTTPS 和 REST api-key
- 这个端点是从 Codex Desktop 行为逆向出来的,可能随 Codex App 更新而变化

## 音频格式

Codex Desktop 的上游端点会检查真实音频容器,不只看 multipart content type。

本地测试中可直接上传的格式:

| Container / codec | 建议 content type |
| --- | --- |
| WAV PCM | `audio/wav` |
| MP3 | `audio/mpeg` |
| M4A 或 MP4 AAC | `audio/mp4` |
| FLAC | `audio/flac` |
| Ogg Opus | `audio/ogg` |
| WebM Opus | `audio/webm` |

通过本地预处理支持的格式:

| Input | 处理方式 |
| --- | --- |
| SILK v3 (`#!SILK_V3`) | 用外部 `rust-silk` 解码成临时 WAV |
| WeChat/Tencent SILK (`0x02 + #!SILK_V3`) | 用外部 `rust-silk` 解码成临时 WAV |

本地测试中直接上传会被上游拒绝的格式:

| Format | 结果 |
| --- | --- |
| ADTS AAC (`.aac`) | HTTP 500, ASR API error |
| AIFF | HTTP 500, ASR API error |
| CAF AAC | HTTP 500, ASR API error |
| Raw PCM stream | HTTP 500, ASR API error |
| SILK v3 (`#!SILK_V3`) | HTTP 500, ASR API error |
| WeChat/Tencent SILK (`0x02 + #!SILK_V3`) | HTTP 500, ASR API error |

如果文件名没有可识别音频扩展名,请传 `--content-type`;CLI 会补一个合适的
multipart 文件名。也可以显式传:

```bash
codex-asr raw-audio --content-type audio/wav --filename voice.wav
```

## 已知边界

- 空文件会返回 HTTP 500 `Error in ASR API`
- 一秒静音 WAV 会返回 HTTP 200 和空文本
- 非常短的非静音片段可能返回不稳定文本
- `prompt``temperature``timestamp_granularities` 只为 SDK 兼容而接受,不会影响上游
- 本地短 WAV 并发测试到 96 个请求没有出现 429/5xx,但这不是公开 API 承诺
- 建议 REST wrapper 对长音频保持有界并发,默认 `--concurrency 16`
- REST server 默认限制上传体积为 `50` MiB,上游请求总超时为 `300`
## Rust library

不需要 REST server 的库用户可以关闭默认特性,避免引入 Axum/Tokio:

```toml
codex-asr = { version = "0.1", default-features = false }
```

```rust
use codex_asr::{CodexAsrClient, TranscribeOptions};

let client = CodexAsrClient::from_codex_home()?;
let result = client.transcribe_file("audio.wav", TranscribeOptions::default())?;
println!("{}", result.text);
# Ok::<(), Box<dyn std::error::Error>>(())
```

## 平台支持

- CLI 和 library 支持 macOS、Linux、Windows
- Release 二进制覆盖 Apple Silicon macOS、Intel macOS、Linux x64、Linux ARM64、Windows x64
- `serve` 功能默认开启;作为库使用时可通过 `default-features = false` 关闭
- SILK 支持依赖外部 `rust-silk` CLI;普通二进制不内置解码器,Docker 镜像默认内置

## 从源码运行

```bash
cargo run -- --help
cargo run -- audio.wav --json
cargo run -- serve --api-key local_dev_key
```

## 视觉资产

- 顶部 `logo.png``gpt-image-2-skill` 生成
- `assets/codex.svg``assets/codex-color.svg` 来自
  [LobeHub Codex (OpenAI) icon]https://lobehub.com/icons/codex
- `assets/codex-color-reference.png` 是从 `assets/codex-color.svg` 渲染出的参考图;
  生成 `logo.png` 时作为 `--ref-image` 传入。直接上传 SVG 会被上游图片编辑
  接口拒绝,因为它只接受 JPEG、PNG、GIF、WebP 参考图。

---

<a id="readme-en"></a>

# English

Codex.app has a one-shot dictation endpoint: it uploads recorded audio to
`https://chatgpt.com/backend-api/transcribe` and receives text back. This is not
the OpenAI Whisper API and it is not a public stable API, but it can reuse the
ChatGPT account that is already signed in on your local Codex Desktop app.

`codex-asr` packages that behavior as a Rust CLI, a publishable crate, and a
local OpenAI Whisper-style REST shim. You can transcribe files directly, or point
existing OpenAI SDK clients at `codex-asr serve` on localhost.

## Who this is for

- You already use Codex Desktop with a signed-in ChatGPT account
- You want to reuse Codex App ASR from scripts, CLIs, agents, or local services
- You want a tiny input surface: a local audio file, or an explicit bearer token
- You want `.silk` / WeChat voice files decoded through your local `rust-silk` CLI
- You accept that this is a reverse-engineered local automation surface, not an official API

## Install

### Homebrew

```bash
brew tap wangnov/tap
brew install codex-asr
```

### cargo-binstall

```bash
cargo binstall codex-asr
```

### cargo install

```bash
cargo install codex-asr
```

### GitHub Release installer

```bash
curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/Wangnov/codex-asr/releases/latest/download/codex-asr-installer.sh \
  | sh
```

PowerShell:

```powershell
powershell -ExecutionPolicy ByPass -c "irm https://github.com/Wangnov/codex-asr/releases/latest/download/codex-asr-installer.ps1 | iex"
```

### Direct binary download

Download the matching archive or installer from the
[latest GitHub Release](https://github.com/Wangnov/codex-asr/releases/latest).

### Docker / GHCR

```bash
docker pull ghcr.io/wangnov/codex-asr:latest
docker run --rm ghcr.io/wangnov/codex-asr:latest --version
```

The `main` branch publishes `latest`, `main`, and `sha-*` tags through GitHub
Actions. Release tags additionally publish `vX.Y.Z`, `X.Y.Z`, and `X.Y` tags.
Docker images include `rust-silk` at `/usr/local/bin/rust-silk` by default.

### From this checkout

```bash
cargo install --path .
```

## Quick start

```bash
# Transcribe directly, reading $CODEX_HOME/auth.json or ~/.codex/auth.json
codex-asr audio.wav

# Send a language hint upstream
codex-asr audio.wav --language zh

# JSON output for scripts
codex-asr audio.wav --json

# Extensionless input with explicit multipart content type
codex-asr raw-audio --content-type audio/wav

# Decode WeChat / SILK audio through an external rust-silk CLI first
codex-asr voice.silk --silk-decoder /path/to/rust-silk
```

## Local REST / OpenAI SDK

Start a local Whisper-style REST endpoint:

```bash
codex-asr serve --api-key local_dev_key --host 127.0.0.1 --port 8788 --concurrency 16
```

Call it with an OpenAI-style multipart request:

```bash
curl http://127.0.0.1:8788/v1/audio/transcriptions \
  -H 'Authorization: Bearer local_dev_key' \
  -F model=whisper-1 \
  -F file=@audio.wav
```

OpenAI SDK examples live in `examples/`:

```bash
python3 -m pip install openai
CODEX_ASR_SERVER_KEY=local_dev_key \
  python3 examples/python_openai_sdk.py audio.wav

npm install openai
CODEX_ASR_SERVER_KEY=local_dev_key \
  node examples/node_openai_sdk.mjs audio.wav
```

Routes:

| Route | Notes |
| --- | --- |
| `GET /healthz` | health check, no auth required |
| `POST /v1/audio/transcriptions` | OpenAI Whisper-style route |
| `POST /audio/transcriptions` | short alias |

Multipart fields:

| Field | Handling |
| --- | --- |
| `file` | required |
| `model` | accepted for SDK compatibility, ignored |
| `language` | forwarded to Codex `/transcribe` |
| `response_format` | supports `json`, `text`, `verbose_json` |
| `prompt`, `temperature`, `timestamp_granularities` | accepted for SDK compatibility, ignored |

`srt` and `vtt` return HTTP 400 because the Codex endpoint does not return
timestamps.

## Public Docker Deployment

This deployment shape is intentionally small: a public port, HTTPS, and an
application API key. `deploy/compose.public.yml` uses Caddy to obtain TLS
certificates on ports 80/443 and reverse proxy to `codex-asr serve` inside the
private Docker network. App auth is `Authorization: Bearer <CODEX_ASR_SERVER_KEY>`.

Prerequisites:

- The domain A/AAAA record points at the server
- The server firewall allows ports 80 and 443
- A usable Codex / ChatGPT auth file exists on the server
- A container-readable auth copy has been prepared

```bash
sudo install -d -m 750 /opt/codex-asr
sudo install -o 10001 -g 65534 -m 0400 ~/.codex/auth.json /opt/codex-asr/auth.json
cp deploy/env.public.example deploy/.env
$EDITOR deploy/.env
docker compose --env-file deploy/.env -f deploy/compose.public.yml up -d
```

Set at least these values in `deploy/.env`:

```bash
CODEX_ASR_DOMAIN=asr.example.com
CODEX_ASR_AUTH_FILE=/opt/codex-asr/auth.json
CODEX_ASR_SERVER_KEY=replace-with-a-long-random-api-key
CODEX_ASR_MAX_UPLOAD_MB=50
CODEX_ASR_REQUEST_TIMEOUT_SECONDS=300
```

Example requests:

```bash
curl https://asr.example.com/healthz

curl https://asr.example.com/v1/audio/transcriptions \
  -H "Authorization: Bearer replace-with-a-long-random-api-key" \
  -F model=whisper-1 \
  -F file=@audio.wav
```

Docker images include `rust-silk` by default, so `.silk` / WeChat voice files
are decoded to temporary WAV inside the container before upload. Override
`CODEX_ASR_SILK_DECODER` only when you want to use a custom decoder.

## Commands

- `codex-asr <audio>`: same as `codex-asr transcribe <audio>`
- `transcribe`: upload one audio file and return text
- `serve`: run the local OpenAI Whisper-style REST shim

Common options:

| Option | Notes |
| --- | --- |
| `--bearer` | explicit ChatGPT bearer token; defaults to local Codex auth |
| `--account-id` | override `ChatGPT-Account-Id`; usually decoded from the bearer token |
| `--auth-file` | custom Codex auth file |
| `--endpoint` | override the upstream `/backend-api/transcribe` URL |
| `--proxy` | HTTPS proxy |
| `--request-timeout-seconds` | total upstream request timeout, default `300` |
| `--connect-timeout-seconds` | upstream connect timeout, default `15` |
| `--language` | language hint, for example `zh` or `en` |
| `--content-type` | audio content type override |
| `--filename` | multipart filename override |
| `--json` | print `{"text":"..."}` |
| `--silk-decoder` | external `rust-silk` CLI path |
| `--silk-sample-rate` | WAV sample rate for SILK decode, default `24000` |
| `--no-silk-decode` | upload `.silk` directly without local decoding |

`serve` also supports:

| Option | Notes |
| --- | --- |
| `--host` | bind host, default `127.0.0.1` |
| `--port` | bind port, default `8788` |
| `--api-key` | local REST API key |
| `--no-api-key` | disable local REST auth, only for trusted loopback |
| `--concurrency` | maximum concurrent upstream ASR requests, default `16` |
| `--max-upload-mb` | REST upload size limit, default `50` MiB |

## Auth model

By default, `codex-asr` reads:

1. `$CODEX_HOME/auth.json`
2. `~/.codex/auth.json`

For the smallest explicit input surface, pass only a bearer token:

```bash
codex-asr audio.wav --bearer "$TOKEN"
CODEX_ASR_BEARER="$TOKEN" codex-asr audio.wav
```

`ChatGPT-Account-Id` is usually decoded from the JWT payload. Override it only
when the token does not contain that claim:

```bash
codex-asr audio.wav --bearer "$TOKEN" --account-id acct_...
```

Environment variables:

| Env | Purpose |
| --- | --- |
| `CODEX_ASR_BEARER` | default bearer token |
| `CODEX_ASR_SERVER_KEY` | local REST API key |
| `CODEX_ASR_PROXY` | HTTPS proxy |
| `CODEX_ASR_REQUEST_TIMEOUT_SECONDS` | total upstream request timeout, default `300` |
| `CODEX_ASR_CONNECT_TIMEOUT_SECONDS` | upstream connect timeout, default `15` |
| `CODEX_ASR_MAX_UPLOAD_MB` | REST upload size limit, default `50` MiB |
| `CODEX_ASR_SILK_DECODER` | external `rust-silk` CLI path; Docker defaults to `/usr/local/bin/rust-silk` |

## What it changes

- Reads your local Codex / ChatGPT auth file, but does not write or refresh it
- `serve` rereads the auth file before each transcription so an updated container auth copy takes effect without restarting
- Uploads the audio file you provide to `https://chatgpt.com/backend-api/transcribe`
- Decodes `.silk` / `.slk` inputs to temporary WAV files before upload by default
- The REST server only wraps requests, normalizes SDK fields, limits concurrency, and converts responses
- It does not persist transcripts unless your caller does so

## Safety

`codex-asr` reuses a personal Codex / ChatGPT login token. Do not expose plain
HTTP or an unauthenticated REST server; public deployment should use at least
HTTPS and an application API key.

- REST binds to `127.0.0.1` by default
- The REST API key is local access control only; it is not an OpenAI or ChatGPT token
- Use `--no-api-key` only on trusted loopback
- `deploy/compose.public.yml` enables Caddy HTTPS and REST api-key auth by default
- This endpoint is reverse-engineered from Codex Desktop behavior and may change without notice

## Audio formats

The Codex Desktop endpoint appears to inspect the actual audio container, not
only the multipart content type.

Formats tested successfully when uploaded directly:

| Container / codec | Suggested content type |
| --- | --- |
| WAV PCM | `audio/wav` |
| MP3 | `audio/mpeg` |
| M4A or MP4 AAC | `audio/mp4` |
| FLAC | `audio/flac` |
| Ogg Opus | `audio/ogg` |
| WebM Opus | `audio/webm` |

Formats supported through local preprocessing:

| Input | Handling |
| --- | --- |
| SILK v3 (`#!SILK_V3`) | decoded to temporary WAV with `rust-silk` |
| WeChat/Tencent SILK (`0x02 + #!SILK_V3`) | decoded to temporary WAV with `rust-silk` |

Formats rejected by the upstream endpoint during local direct-upload tests:

| Format | Result |
| --- | --- |
| ADTS AAC (`.aac`) | HTTP 500, ASR API error |
| AIFF | HTTP 500, ASR API error |
| CAF AAC | HTTP 500, ASR API error |
| Raw PCM stream | HTTP 500, ASR API error |
| SILK v3 (`#!SILK_V3`) | HTTP 500, ASR API error |
| WeChat/Tencent SILK (`0x02 + #!SILK_V3`) | HTTP 500, ASR API error |

Files with no recognizable audio extension should be uploaded with a known
`--content-type`; the CLI will add a matching multipart filename extension. You
can override it explicitly:

```bash
codex-asr raw-audio --content-type audio/wav --filename voice.wav
```

## Known limits

- Empty files return HTTP 500 with `Error in ASR API`
- One second of silence returns HTTP 200 with an empty transcript
- Very short non-silent clips can return unstable text
- `prompt`, `temperature`, and `timestamp_granularities` are accepted only for SDK compatibility
- Local short-WAV probes reached 96 concurrent requests without 429/5xx, but that is not a public API contract
- Keep long-audio REST usage bounded; the default is `--concurrency 16`
- The REST server defaults to a `50` MiB upload limit and a `300` second upstream request timeout

## Rust library

Library consumers that do not need the local REST server can avoid Axum/Tokio:

```toml
codex-asr = { version = "0.1", default-features = false }
```

```rust
use codex_asr::{CodexAsrClient, TranscribeOptions};

let client = CodexAsrClient::from_codex_home()?;
let result = client.transcribe_file("audio.wav", TranscribeOptions::default())?;
println!("{}", result.text);
# Ok::<(), Box<dyn std::error::Error>>(())
```

## Platforms

- CLI and library support macOS, Linux, and Windows
- Prebuilt binaries are published for Apple Silicon macOS, Intel macOS, Linux x64, Linux ARM64, and Windows x64
- `serve` is enabled by default; library users can disable it with `default-features = false`
- SILK support depends on an external `rust-silk` CLI; regular binaries do not bundle a decoder, while Docker images include one by default

## Run from source

```bash
cargo run -- --help
cargo run -- audio.wav --json
cargo run -- serve --api-key local_dev_key
```

## Visual assets

- The top `logo.png` was generated with `gpt-image-2-skill`
- `assets/codex.svg` and `assets/codex-color.svg` are from the
  [LobeHub Codex (OpenAI) icon]https://lobehub.com/icons/codex
- `assets/codex-color-reference.png` is rendered from `assets/codex-color.svg`
  and was passed as the `--ref-image` when generating `logo.png`. Passing the
  SVG directly was rejected by the upstream image-editing API because it only
  accepts JPEG, PNG, GIF, and WebP reference images.