中文
Codex.app 里有一个一次性语音输入接口:它把录音上传到
https://chatgpt.com/backend-api/transcribe,再拿回文本。这个接口不是
OpenAI Whisper API,也不是公开稳定 API,但它可以复用你已经登录在本机
Codex Desktop / ChatGPT 里的账号。
codex-asr 把这件事拆成一个 Rust CLI、一个可发布的 crate,以及一个本地
OpenAI Whisper 风格 REST 包装层。你可以直接转写音频文件,也可以让现有
OpenAI SDK 指向本机 codex-asr serve。
适合谁用
- 你已经在 Codex Desktop 里登录了 ChatGPT 账号
- 你想在脚本、CLI、Agent 或本地服务里复用 Codex App 的 ASR 能力
- 你只想提供最小输入面,例如一个本地音频文件,或显式传入一个 bearer token
- 你想把
.silk/ 微信语音先经本机rust-silk解码,再上传转写 - 你接受这是逆向自 Codex Desktop 行为的本地自动化工具,而不是官方 API
安装
Homebrew
cargo-binstall
cargo install
GitHub Release 安装脚本
|
PowerShell:
powershell -ExecutionPolicy ByPass -c "irm https://github.com/Wangnov/codex-asr/releases/latest/download/codex-asr-installer.ps1 | iex"
直接下载二进制
从 最新 GitHub Release 下载对应平台的压缩包或安装脚本。
Docker / GHCR
main 分支会通过 GitHub Actions 发布 latest、main 和 sha-*
标签;正式 release tag 会额外发布 vX.Y.Z、X.Y.Z 和 X.Y 标签。
Docker 镜像默认内置 rust-silk,路径是 /usr/local/bin/rust-silk。
从当前源码安装
快速上手
# 直接转写,认证默认读取 $CODEX_HOME/auth.json 或 ~/.codex/auth.json
# 给上游 ASR 一个语言提示
# JSON 输出,方便脚本消费
# 扩展名缺失时显式告诉 multipart content type
# 微信 / SILK 语音先通过外部 rust-silk CLI 解码成临时 WAV
本地 REST / OpenAI SDK
启动一个本地 Whisper 风格 REST 端点:
然后用 OpenAI 风格 multipart 请求调用:
OpenAI SDK 示例在 examples/:
CODEX_ASR_SERVER_KEY=local_dev_key \
CODEX_ASR_SERVER_KEY=local_dev_key \
REST 路由:
| Route | 说明 |
|---|---|
GET /healthz |
健康检查,不需要认证 |
POST /v1/audio/transcriptions |
OpenAI Whisper 风格路由 |
POST /audio/transcriptions |
短别名 |
REST 字段兼容范围:
| Field | 处理方式 |
|---|---|
file |
必填 |
model |
为 SDK 兼容而接受,忽略 |
language |
转发给 Codex /transcribe |
response_format |
支持 json、text、verbose_json |
prompt、temperature、timestamp_granularities |
为 SDK 兼容而接受,忽略 |
srt 和 vtt 会返回 HTTP 400,因为 Codex 这个端点不返回时间戳。
Docker 公网部署
这个部署形态只承诺三件事:公网端口、HTTPS、api-key。deploy/compose.public.yml
用 Caddy 自动申请证书,把 80/443 暴露到公网,再反代到容器里的
codex-asr serve。应用层鉴权使用 Authorization: Bearer <CODEX_ASR_SERVER_KEY>。
前置条件:
- 域名 A/AAAA 记录已经指向服务器
- 服务器公网防火墙放行 80 和 443
- 服务器上已有可用的 Codex / ChatGPT auth 文件
- 已准备一个容器用户可读的 auth 副本
deploy/.env 里至少要设置:
CODEX_ASR_DOMAIN=asr.example.com
CODEX_ASR_AUTH_FILE=/opt/codex-asr/auth.json
CODEX_ASR_SERVER_KEY=replace-with-a-long-random-api-key
CODEX_ASR_MAX_UPLOAD_MB=50
CODEX_ASR_REQUEST_TIMEOUT_SECONDS=300
调用示例:
Docker 镜像默认内置 rust-silk,所以 .silk / 微信语音会在容器内先解码成
临时 WAV 再上传。只有要替换成自定义解码器时,才需要覆盖
CODEX_ASR_SILK_DECODER。
命令说明
codex-asr <audio>:默认等同于codex-asr transcribe <audio>transcribe:上传一个音频文件并返回文本serve:启动本地 OpenAI Whisper 风格 REST 包装层
常用选项:
| Option | 说明 |
|---|---|
--bearer |
显式传入 ChatGPT bearer token;默认读取本机 Codex auth |
--account-id |
覆盖 ChatGPT-Account-Id;通常会从 bearer token 自动解析 |
--auth-file |
指定 Codex auth 文件 |
--endpoint |
覆盖上游 /backend-api/transcribe URL |
--proxy |
指定 HTTPS 代理 |
--request-timeout-seconds |
上游请求总超时,默认 300 |
--connect-timeout-seconds |
上游连接超时,默认 15 |
--language |
语言提示,例如 zh 或 en |
--content-type |
覆盖音频 content type |
--filename |
覆盖 multipart 文件名 |
--json |
输出 {"text":"..."} |
--silk-decoder |
指定外部 rust-silk CLI |
--silk-sample-rate |
SILK 解码成 WAV 时的采样率,默认 24000 |
--no-silk-decode |
直接上传 .silk,不做本地解码 |
serve 额外支持:
| Option | 说明 |
|---|---|
--host |
绑定地址,默认 127.0.0.1 |
--port |
绑定端口,默认 8788 |
--api-key |
本地 REST API key |
--no-api-key |
关闭本地 REST 认证,仅限可信 loopback |
--concurrency |
上游 ASR 并发上限,默认 16 |
--max-upload-mb |
REST 上传体积上限,默认 50 MiB |
认证模型
默认情况下,codex-asr 会读取:
$CODEX_HOME/auth.json~/.codex/auth.json
如果你想把输入面压到最小,可以只传 bearer:
CODEX_ASR_BEARER=""
ChatGPT-Account-Id 通常会从 bearer token 的 JWT payload 中自动解析。只有当
token 里没有这个 claim 时,才需要手动传:
可用环境变量:
| Env | 用途 |
|---|---|
CODEX_ASR_BEARER |
默认 bearer token |
CODEX_ASR_SERVER_KEY |
本地 REST API key |
CODEX_ASR_PROXY |
HTTPS 代理 |
CODEX_ASR_REQUEST_TIMEOUT_SECONDS |
上游请求总超时,默认 300 |
CODEX_ASR_CONNECT_TIMEOUT_SECONDS |
上游连接超时,默认 15 |
CODEX_ASR_MAX_UPLOAD_MB |
REST 上传体积上限,默认 50 MiB |
CODEX_ASR_SILK_DECODER |
外部 rust-silk CLI 路径;Docker 镜像默认 /usr/local/bin/rust-silk |
它会改什么
- 读取本机 Codex / ChatGPT auth 文件,但不会写入或刷新它
serve会在每次转写前重新读取 auth 文件,以便容器中的 auth 副本更新后无需重启服务- 上传你指定的音频文件到
https://chatgpt.com/backend-api/transcribe .silk/.slk输入默认先在本地解码成临时 WAV,再上传 WAV- REST server 只做本地包装、字段兼容、并发限制和响应格式转换
- 不会保存转写结果,除非你的调用方自己保存
安全边界
codex-asr 会复用你的个人 Codex / ChatGPT 登录 token。不要裸露 HTTP 或无
api-key 的 REST server;公网部署至少使用 HTTPS 和应用层 api-key。
- REST 默认绑定
127.0.0.1 - 本地 REST key 只是本地访问控制,不是 OpenAI 或 ChatGPT token
- 只有在可信本机回环环境里才使用
--no-api-key deploy/compose.public.yml的公网形态默认启用 Caddy HTTPS 和 REST api-key- 这个端点是从 Codex Desktop 行为逆向出来的,可能随 Codex App 更新而变化
音频格式
Codex Desktop 的上游端点会检查真实音频容器,不只看 multipart content type。
本地测试中可直接上传的格式:
| Container / codec | 建议 content type |
|---|---|
| WAV PCM | audio/wav |
| MP3 | audio/mpeg |
| M4A 或 MP4 AAC | audio/mp4 |
| FLAC | audio/flac |
| Ogg Opus | audio/ogg |
| WebM Opus | audio/webm |
通过本地预处理支持的格式:
| Input | 处理方式 |
|---|---|
SILK v3 (#!SILK_V3) |
用外部 rust-silk 解码成临时 WAV |
WeChat/Tencent SILK (0x02 + #!SILK_V3) |
用外部 rust-silk 解码成临时 WAV |
本地测试中直接上传会被上游拒绝的格式:
| Format | 结果 |
|---|---|
ADTS AAC (.aac) |
HTTP 500, ASR API error |
| AIFF | HTTP 500, ASR API error |
| CAF AAC | HTTP 500, ASR API error |
| Raw PCM stream | HTTP 500, ASR API error |
SILK v3 (#!SILK_V3) |
HTTP 500, ASR API error |
WeChat/Tencent SILK (0x02 + #!SILK_V3) |
HTTP 500, ASR API error |
如果文件名没有可识别音频扩展名,请传 --content-type;CLI 会补一个合适的
multipart 文件名。也可以显式传:
已知边界
- 空文件会返回 HTTP 500
Error in ASR API - 一秒静音 WAV 会返回 HTTP 200 和空文本
- 非常短的非静音片段可能返回不稳定文本
prompt、temperature、timestamp_granularities只为 SDK 兼容而接受,不会影响上游- 本地短 WAV 并发测试到 96 个请求没有出现 429/5xx,但这不是公开 API 承诺
- 建议 REST wrapper 对长音频保持有界并发,默认
--concurrency 16 - REST server 默认限制上传体积为
50MiB,上游请求总超时为300秒
Rust library
不需要 REST server 的库用户可以关闭默认特性,避免引入 Axum/Tokio:
= { = "0.1", = false }
use ;
let client = from_codex_home?;
let result = client.transcribe_file?;
println!;
# Ok::
平台支持
- CLI 和 library 支持 macOS、Linux、Windows
- Release 二进制覆盖 Apple Silicon macOS、Intel macOS、Linux x64、Linux ARM64、Windows x64
serve功能默认开启;作为库使用时可通过default-features = false关闭- SILK 支持依赖外部
rust-silkCLI;普通二进制不内置解码器,Docker 镜像默认内置
从源码运行
视觉资产
- 顶部
logo.png由gpt-image-2-skill生成 assets/codex.svg与assets/codex-color.svg来自 LobeHub Codex (OpenAI) iconassets/codex-color-reference.png是从assets/codex-color.svg渲染出的参考图; 生成logo.png时作为--ref-image传入。直接上传 SVG 会被上游图片编辑 接口拒绝,因为它只接受 JPEG、PNG、GIF、WebP 参考图。
English
Codex.app has a one-shot dictation endpoint: it uploads recorded audio to
https://chatgpt.com/backend-api/transcribe and receives text back. This is not
the OpenAI Whisper API and it is not a public stable API, but it can reuse the
ChatGPT account that is already signed in on your local Codex Desktop app.
codex-asr packages that behavior as a Rust CLI, a publishable crate, and a
local OpenAI Whisper-style REST shim. You can transcribe files directly, or point
existing OpenAI SDK clients at codex-asr serve on localhost.
Who this is for
- You already use Codex Desktop with a signed-in ChatGPT account
- You want to reuse Codex App ASR from scripts, CLIs, agents, or local services
- You want a tiny input surface: a local audio file, or an explicit bearer token
- You want
.silk/ WeChat voice files decoded through your localrust-silkCLI - You accept that this is a reverse-engineered local automation surface, not an official API
Install
Homebrew
cargo-binstall
cargo install
GitHub Release installer
|
PowerShell:
powershell -ExecutionPolicy ByPass -c "irm https://github.com/Wangnov/codex-asr/releases/latest/download/codex-asr-installer.ps1 | iex"
Direct binary download
Download the matching archive or installer from the latest GitHub Release.
Docker / GHCR
The main branch publishes latest, main, and sha-* tags through GitHub
Actions. Release tags additionally publish vX.Y.Z, X.Y.Z, and X.Y tags.
Docker images include rust-silk at /usr/local/bin/rust-silk by default.
From this checkout
Quick start
# Transcribe directly, reading $CODEX_HOME/auth.json or ~/.codex/auth.json
# Send a language hint upstream
# JSON output for scripts
# Extensionless input with explicit multipart content type
# Decode WeChat / SILK audio through an external rust-silk CLI first
Local REST / OpenAI SDK
Start a local Whisper-style REST endpoint:
Call it with an OpenAI-style multipart request:
OpenAI SDK examples live in examples/:
CODEX_ASR_SERVER_KEY=local_dev_key \
CODEX_ASR_SERVER_KEY=local_dev_key \
Routes:
| Route | Notes |
|---|---|
GET /healthz |
health check, no auth required |
POST /v1/audio/transcriptions |
OpenAI Whisper-style route |
POST /audio/transcriptions |
short alias |
Multipart fields:
| Field | Handling |
|---|---|
file |
required |
model |
accepted for SDK compatibility, ignored |
language |
forwarded to Codex /transcribe |
response_format |
supports json, text, verbose_json |
prompt, temperature, timestamp_granularities |
accepted for SDK compatibility, ignored |
srt and vtt return HTTP 400 because the Codex endpoint does not return
timestamps.
Public Docker Deployment
This deployment shape is intentionally small: a public port, HTTPS, and an
application API key. deploy/compose.public.yml uses Caddy to obtain TLS
certificates on ports 80/443 and reverse proxy to codex-asr serve inside the
private Docker network. App auth is Authorization: Bearer <CODEX_ASR_SERVER_KEY>.
Prerequisites:
- The domain A/AAAA record points at the server
- The server firewall allows ports 80 and 443
- A usable Codex / ChatGPT auth file exists on the server
- A container-readable auth copy has been prepared
Set at least these values in deploy/.env:
CODEX_ASR_DOMAIN=asr.example.com
CODEX_ASR_AUTH_FILE=/opt/codex-asr/auth.json
CODEX_ASR_SERVER_KEY=replace-with-a-long-random-api-key
CODEX_ASR_MAX_UPLOAD_MB=50
CODEX_ASR_REQUEST_TIMEOUT_SECONDS=300
Example requests:
Docker images include rust-silk by default, so .silk / WeChat voice files
are decoded to temporary WAV inside the container before upload. Override
CODEX_ASR_SILK_DECODER only when you want to use a custom decoder.
Commands
codex-asr <audio>: same ascodex-asr transcribe <audio>transcribe: upload one audio file and return textserve: run the local OpenAI Whisper-style REST shim
Common options:
| Option | Notes |
|---|---|
--bearer |
explicit ChatGPT bearer token; defaults to local Codex auth |
--account-id |
override ChatGPT-Account-Id; usually decoded from the bearer token |
--auth-file |
custom Codex auth file |
--endpoint |
override the upstream /backend-api/transcribe URL |
--proxy |
HTTPS proxy |
--request-timeout-seconds |
total upstream request timeout, default 300 |
--connect-timeout-seconds |
upstream connect timeout, default 15 |
--language |
language hint, for example zh or en |
--content-type |
audio content type override |
--filename |
multipart filename override |
--json |
print {"text":"..."} |
--silk-decoder |
external rust-silk CLI path |
--silk-sample-rate |
WAV sample rate for SILK decode, default 24000 |
--no-silk-decode |
upload .silk directly without local decoding |
serve also supports:
| Option | Notes |
|---|---|
--host |
bind host, default 127.0.0.1 |
--port |
bind port, default 8788 |
--api-key |
local REST API key |
--no-api-key |
disable local REST auth, only for trusted loopback |
--concurrency |
maximum concurrent upstream ASR requests, default 16 |
--max-upload-mb |
REST upload size limit, default 50 MiB |
Auth model
By default, codex-asr reads:
$CODEX_HOME/auth.json~/.codex/auth.json
For the smallest explicit input surface, pass only a bearer token:
CODEX_ASR_BEARER=""
ChatGPT-Account-Id is usually decoded from the JWT payload. Override it only
when the token does not contain that claim:
Environment variables:
| Env | Purpose |
|---|---|
CODEX_ASR_BEARER |
default bearer token |
CODEX_ASR_SERVER_KEY |
local REST API key |
CODEX_ASR_PROXY |
HTTPS proxy |
CODEX_ASR_REQUEST_TIMEOUT_SECONDS |
total upstream request timeout, default 300 |
CODEX_ASR_CONNECT_TIMEOUT_SECONDS |
upstream connect timeout, default 15 |
CODEX_ASR_MAX_UPLOAD_MB |
REST upload size limit, default 50 MiB |
CODEX_ASR_SILK_DECODER |
external rust-silk CLI path; Docker defaults to /usr/local/bin/rust-silk |
What it changes
- Reads your local Codex / ChatGPT auth file, but does not write or refresh it
serverereads the auth file before each transcription so an updated container auth copy takes effect without restarting- Uploads the audio file you provide to
https://chatgpt.com/backend-api/transcribe - Decodes
.silk/.slkinputs to temporary WAV files before upload by default - The REST server only wraps requests, normalizes SDK fields, limits concurrency, and converts responses
- It does not persist transcripts unless your caller does so
Safety
codex-asr reuses a personal Codex / ChatGPT login token. Do not expose plain
HTTP or an unauthenticated REST server; public deployment should use at least
HTTPS and an application API key.
- REST binds to
127.0.0.1by default - The REST API key is local access control only; it is not an OpenAI or ChatGPT token
- Use
--no-api-keyonly on trusted loopback deploy/compose.public.ymlenables Caddy HTTPS and REST api-key auth by default- This endpoint is reverse-engineered from Codex Desktop behavior and may change without notice
Audio formats
The Codex Desktop endpoint appears to inspect the actual audio container, not only the multipart content type.
Formats tested successfully when uploaded directly:
| Container / codec | Suggested content type |
|---|---|
| WAV PCM | audio/wav |
| MP3 | audio/mpeg |
| M4A or MP4 AAC | audio/mp4 |
| FLAC | audio/flac |
| Ogg Opus | audio/ogg |
| WebM Opus | audio/webm |
Formats supported through local preprocessing:
| Input | Handling |
|---|---|
SILK v3 (#!SILK_V3) |
decoded to temporary WAV with rust-silk |
WeChat/Tencent SILK (0x02 + #!SILK_V3) |
decoded to temporary WAV with rust-silk |
Formats rejected by the upstream endpoint during local direct-upload tests:
| Format | Result |
|---|---|
ADTS AAC (.aac) |
HTTP 500, ASR API error |
| AIFF | HTTP 500, ASR API error |
| CAF AAC | HTTP 500, ASR API error |
| Raw PCM stream | HTTP 500, ASR API error |
SILK v3 (#!SILK_V3) |
HTTP 500, ASR API error |
WeChat/Tencent SILK (0x02 + #!SILK_V3) |
HTTP 500, ASR API error |
Files with no recognizable audio extension should be uploaded with a known
--content-type; the CLI will add a matching multipart filename extension. You
can override it explicitly:
Known limits
- Empty files return HTTP 500 with
Error in ASR API - One second of silence returns HTTP 200 with an empty transcript
- Very short non-silent clips can return unstable text
prompt,temperature, andtimestamp_granularitiesare accepted only for SDK compatibility- Local short-WAV probes reached 96 concurrent requests without 429/5xx, but that is not a public API contract
- Keep long-audio REST usage bounded; the default is
--concurrency 16 - The REST server defaults to a
50MiB upload limit and a300second upstream request timeout
Rust library
Library consumers that do not need the local REST server can avoid Axum/Tokio:
= { = "0.1", = false }
use ;
let client = from_codex_home?;
let result = client.transcribe_file?;
println!;
# Ok::
Platforms
- CLI and library support macOS, Linux, and Windows
- Prebuilt binaries are published for Apple Silicon macOS, Intel macOS, Linux x64, Linux ARM64, and Windows x64
serveis enabled by default; library users can disable it withdefault-features = false- SILK support depends on an external
rust-silkCLI; regular binaries do not bundle a decoder, while Docker images include one by default
Run from source
Visual assets
- The top
logo.pngwas generated withgpt-image-2-skill assets/codex.svgandassets/codex-color.svgare from the LobeHub Codex (OpenAI) iconassets/codex-color-reference.pngis rendered fromassets/codex-color.svgand was passed as the--ref-imagewhen generatinglogo.png. Passing the SVG directly was rejected by the upstream image-editing API because it only accepts JPEG, PNG, GIF, and WebP reference images.