codex-asr 0.1.2

Unofficial Codex Desktop ASR client that reuses local ChatGPT auth for one-shot transcription
Documentation

中文

Codex.app 里有一个一次性语音输入接口:它把录音上传到 https://chatgpt.com/backend-api/transcribe,再拿回文本。这个接口不是 OpenAI Whisper API,也不是公开稳定 API,但它可以复用你已经登录在本机 Codex Desktop / ChatGPT 里的账号。

codex-asr 把这件事拆成一个 Rust CLI、一个可发布的 crate,以及一个本地 OpenAI Whisper 风格 REST 包装层。你可以直接转写音频文件,也可以让现有 OpenAI SDK 指向本机 codex-asr serve

适合谁用

  • 你已经在 Codex Desktop 里登录了 ChatGPT 账号
  • 你想在脚本、CLI、Agent 或本地服务里复用 Codex App 的 ASR 能力
  • 你只想提供最小输入面,例如一个本地音频文件,或显式传入一个 bearer token
  • 你想把 .silk / 微信语音先经本机 rust-silk 解码,再上传转写
  • 你接受这是逆向自 Codex Desktop 行为的本地自动化工具,而不是官方 API

安装

Homebrew

brew tap wangnov/tap
brew install codex-asr

cargo-binstall

cargo binstall codex-asr

cargo install

cargo install codex-asr

GitHub Release 安装脚本

curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/Wangnov/codex-asr/releases/latest/download/codex-asr-installer.sh \
  | sh

PowerShell:

powershell -ExecutionPolicy ByPass -c "irm https://github.com/Wangnov/codex-asr/releases/latest/download/codex-asr-installer.ps1 | iex"

直接下载二进制

最新 GitHub Release 下载对应平台的压缩包或安装脚本。

Docker / GHCR

docker pull ghcr.io/wangnov/codex-asr:latest
docker run --rm ghcr.io/wangnov/codex-asr:latest --version

main 分支会通过 GitHub Actions 发布 latestmainsha-* 标签;正式 release tag 会额外发布 vX.Y.ZX.Y.ZX.Y 标签。 Docker 镜像默认内置 rust-silk,路径是 /usr/local/bin/rust-silk

从当前源码安装

cargo install --path .

快速上手

# 直接转写,认证默认读取 $CODEX_HOME/auth.json 或 ~/.codex/auth.json
codex-asr audio.wav

# 给上游 ASR 一个语言提示
codex-asr audio.wav --language zh

# JSON 输出,方便脚本消费
codex-asr audio.wav --json

# 扩展名缺失时显式告诉 multipart content type
codex-asr raw-audio --content-type audio/wav

# 微信 / SILK 语音先通过外部 rust-silk CLI 解码成临时 WAV
codex-asr voice.silk --silk-decoder /path/to/rust-silk

本地 REST / OpenAI SDK

启动一个本地 Whisper 风格 REST 端点:

codex-asr serve --api-key local_dev_key --host 127.0.0.1 --port 8788 --concurrency 16

然后用 OpenAI 风格 multipart 请求调用:

curl http://127.0.0.1:8788/v1/audio/transcriptions \
  -H 'Authorization: Bearer local_dev_key' \
  -F model=whisper-1 \
  -F file=@audio.wav

OpenAI SDK 示例在 examples/

python3 -m pip install openai
CODEX_ASR_SERVER_KEY=local_dev_key \
  python3 examples/python_openai_sdk.py audio.wav

npm install openai
CODEX_ASR_SERVER_KEY=local_dev_key \
  node examples/node_openai_sdk.mjs audio.wav

REST 路由:

Route 说明
GET /healthz 健康检查,不需要认证
POST /v1/audio/transcriptions OpenAI Whisper 风格路由
POST /audio/transcriptions 短别名

REST 字段兼容范围:

Field 处理方式
file 必填
model 为 SDK 兼容而接受,忽略
language 转发给 Codex /transcribe
response_format 支持 jsontextverbose_json
prompttemperaturetimestamp_granularities 为 SDK 兼容而接受,忽略

srtvtt 会返回 HTTP 400,因为 Codex 这个端点不返回时间戳。

Docker 公网部署

这个部署形态只承诺三件事:公网端口、HTTPS、api-key。deploy/compose.public.yml 用 Caddy 自动申请证书,把 80/443 暴露到公网,再反代到容器里的 codex-asr serve。应用层鉴权使用 Authorization: Bearer <CODEX_ASR_SERVER_KEY>

前置条件:

  • 域名 A/AAAA 记录已经指向服务器
  • 服务器公网防火墙放行 80 和 443
  • 服务器上已有可用的 Codex / ChatGPT auth 文件
  • 已准备一个容器用户可读的 auth 副本
sudo install -d -m 750 /opt/codex-asr
sudo install -o 10001 -g 65534 -m 0400 ~/.codex/auth.json /opt/codex-asr/auth.json
cp deploy/env.public.example deploy/.env
$EDITOR deploy/.env
docker compose --env-file deploy/.env -f deploy/compose.public.yml up -d

deploy/.env 里至少要设置:

CODEX_ASR_DOMAIN=asr.example.com
CODEX_ASR_AUTH_FILE=/opt/codex-asr/auth.json
CODEX_ASR_SERVER_KEY=replace-with-a-long-random-api-key
CODEX_ASR_MAX_UPLOAD_MB=50
CODEX_ASR_REQUEST_TIMEOUT_SECONDS=300

调用示例:

curl https://asr.example.com/healthz

curl https://asr.example.com/v1/audio/transcriptions \
  -H "Authorization: Bearer replace-with-a-long-random-api-key" \
  -F model=whisper-1 \
  -F file=@audio.wav

Docker 镜像默认内置 rust-silk,所以 .silk / 微信语音会在容器内先解码成 临时 WAV 再上传。只有要替换成自定义解码器时,才需要覆盖 CODEX_ASR_SILK_DECODER

命令说明

  • codex-asr <audio>:默认等同于 codex-asr transcribe <audio>
  • transcribe:上传一个音频文件并返回文本
  • serve:启动本地 OpenAI Whisper 风格 REST 包装层

常用选项:

Option 说明
--bearer 显式传入 ChatGPT bearer token;默认读取本机 Codex auth
--account-id 覆盖 ChatGPT-Account-Id;通常会从 bearer token 自动解析
--auth-file 指定 Codex auth 文件
--endpoint 覆盖上游 /backend-api/transcribe URL
--proxy 指定 HTTPS 代理
--request-timeout-seconds 上游请求总超时,默认 300
--connect-timeout-seconds 上游连接超时,默认 15
--language 语言提示,例如 zhen
--content-type 覆盖音频 content type
--filename 覆盖 multipart 文件名
--json 输出 {"text":"..."}
--silk-decoder 指定外部 rust-silk CLI
--silk-sample-rate SILK 解码成 WAV 时的采样率,默认 24000
--no-silk-decode 直接上传 .silk,不做本地解码

serve 额外支持:

Option 说明
--host 绑定地址,默认 127.0.0.1
--port 绑定端口,默认 8788
--api-key 本地 REST API key
--no-api-key 关闭本地 REST 认证,仅限可信 loopback
--concurrency 上游 ASR 并发上限,默认 16
--max-upload-mb REST 上传体积上限,默认 50 MiB

认证模型

默认情况下,codex-asr 会读取:

  1. $CODEX_HOME/auth.json
  2. ~/.codex/auth.json

如果你想把输入面压到最小,可以只传 bearer:

codex-asr audio.wav --bearer "$TOKEN"
CODEX_ASR_BEARER="$TOKEN" codex-asr audio.wav

ChatGPT-Account-Id 通常会从 bearer token 的 JWT payload 中自动解析。只有当 token 里没有这个 claim 时,才需要手动传:

codex-asr audio.wav --bearer "$TOKEN" --account-id acct_...

可用环境变量:

Env 用途
CODEX_ASR_BEARER 默认 bearer token
CODEX_ASR_SERVER_KEY 本地 REST API key
CODEX_ASR_PROXY HTTPS 代理
CODEX_ASR_REQUEST_TIMEOUT_SECONDS 上游请求总超时,默认 300
CODEX_ASR_CONNECT_TIMEOUT_SECONDS 上游连接超时,默认 15
CODEX_ASR_MAX_UPLOAD_MB REST 上传体积上限,默认 50 MiB
CODEX_ASR_SILK_DECODER 外部 rust-silk CLI 路径;Docker 镜像默认 /usr/local/bin/rust-silk

它会改什么

  • 读取本机 Codex / ChatGPT auth 文件,但不会写入或刷新它
  • serve 会在每次转写前重新读取 auth 文件,以便容器中的 auth 副本更新后无需重启服务
  • 上传你指定的音频文件到 https://chatgpt.com/backend-api/transcribe
  • .silk / .slk 输入默认先在本地解码成临时 WAV,再上传 WAV
  • REST server 只做本地包装、字段兼容、并发限制和响应格式转换
  • 不会保存转写结果,除非你的调用方自己保存

安全边界

codex-asr 会复用你的个人 Codex / ChatGPT 登录 token。不要裸露 HTTP 或无 api-key 的 REST server;公网部署至少使用 HTTPS 和应用层 api-key。

  • REST 默认绑定 127.0.0.1
  • 本地 REST key 只是本地访问控制,不是 OpenAI 或 ChatGPT token
  • 只有在可信本机回环环境里才使用 --no-api-key
  • deploy/compose.public.yml 的公网形态默认启用 Caddy HTTPS 和 REST api-key
  • 这个端点是从 Codex Desktop 行为逆向出来的,可能随 Codex App 更新而变化

音频格式

Codex Desktop 的上游端点会检查真实音频容器,不只看 multipart content type。

本地测试中可直接上传的格式:

Container / codec 建议 content type
WAV PCM audio/wav
MP3 audio/mpeg
M4A 或 MP4 AAC audio/mp4
FLAC audio/flac
Ogg Opus audio/ogg
WebM Opus audio/webm

通过本地预处理支持的格式:

Input 处理方式
SILK v3 (#!SILK_V3) 用外部 rust-silk 解码成临时 WAV
WeChat/Tencent SILK (0x02 + #!SILK_V3) 用外部 rust-silk 解码成临时 WAV

本地测试中直接上传会被上游拒绝的格式:

Format 结果
ADTS AAC (.aac) HTTP 500, ASR API error
AIFF HTTP 500, ASR API error
CAF AAC HTTP 500, ASR API error
Raw PCM stream HTTP 500, ASR API error
SILK v3 (#!SILK_V3) HTTP 500, ASR API error
WeChat/Tencent SILK (0x02 + #!SILK_V3) HTTP 500, ASR API error

如果文件名没有可识别音频扩展名,请传 --content-type;CLI 会补一个合适的 multipart 文件名。也可以显式传:

codex-asr raw-audio --content-type audio/wav --filename voice.wav

已知边界

  • 空文件会返回 HTTP 500 Error in ASR API
  • 一秒静音 WAV 会返回 HTTP 200 和空文本
  • 非常短的非静音片段可能返回不稳定文本
  • prompttemperaturetimestamp_granularities 只为 SDK 兼容而接受,不会影响上游
  • 本地短 WAV 并发测试到 96 个请求没有出现 429/5xx,但这不是公开 API 承诺
  • 建议 REST wrapper 对长音频保持有界并发,默认 --concurrency 16
  • REST server 默认限制上传体积为 50 MiB,上游请求总超时为 300

Rust library

不需要 REST server 的库用户可以关闭默认特性,避免引入 Axum/Tokio:

codex-asr = { version = "0.1", default-features = false }
use codex_asr::{CodexAsrClient, TranscribeOptions};

let client = CodexAsrClient::from_codex_home()?;
let result = client.transcribe_file("audio.wav", TranscribeOptions::default())?;
println!("{}", result.text);
# Ok::<(), Box<dyn std::error::Error>>(())

平台支持

  • CLI 和 library 支持 macOS、Linux、Windows
  • Release 二进制覆盖 Apple Silicon macOS、Intel macOS、Linux x64、Linux ARM64、Windows x64
  • serve 功能默认开启;作为库使用时可通过 default-features = false 关闭
  • SILK 支持依赖外部 rust-silk CLI;普通二进制不内置解码器,Docker 镜像默认内置

从源码运行

cargo run -- --help
cargo run -- audio.wav --json
cargo run -- serve --api-key local_dev_key

视觉资产

  • 顶部 logo.pnggpt-image-2-skill 生成
  • assets/codex.svgassets/codex-color.svg 来自 LobeHub Codex (OpenAI) icon
  • assets/codex-color-reference.png 是从 assets/codex-color.svg 渲染出的参考图; 生成 logo.png 时作为 --ref-image 传入。直接上传 SVG 会被上游图片编辑 接口拒绝,因为它只接受 JPEG、PNG、GIF、WebP 参考图。

English

Codex.app has a one-shot dictation endpoint: it uploads recorded audio to https://chatgpt.com/backend-api/transcribe and receives text back. This is not the OpenAI Whisper API and it is not a public stable API, but it can reuse the ChatGPT account that is already signed in on your local Codex Desktop app.

codex-asr packages that behavior as a Rust CLI, a publishable crate, and a local OpenAI Whisper-style REST shim. You can transcribe files directly, or point existing OpenAI SDK clients at codex-asr serve on localhost.

Who this is for

  • You already use Codex Desktop with a signed-in ChatGPT account
  • You want to reuse Codex App ASR from scripts, CLIs, agents, or local services
  • You want a tiny input surface: a local audio file, or an explicit bearer token
  • You want .silk / WeChat voice files decoded through your local rust-silk CLI
  • You accept that this is a reverse-engineered local automation surface, not an official API

Install

Homebrew

brew tap wangnov/tap
brew install codex-asr

cargo-binstall

cargo binstall codex-asr

cargo install

cargo install codex-asr

GitHub Release installer

curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/Wangnov/codex-asr/releases/latest/download/codex-asr-installer.sh \
  | sh

PowerShell:

powershell -ExecutionPolicy ByPass -c "irm https://github.com/Wangnov/codex-asr/releases/latest/download/codex-asr-installer.ps1 | iex"

Direct binary download

Download the matching archive or installer from the latest GitHub Release.

Docker / GHCR

docker pull ghcr.io/wangnov/codex-asr:latest
docker run --rm ghcr.io/wangnov/codex-asr:latest --version

The main branch publishes latest, main, and sha-* tags through GitHub Actions. Release tags additionally publish vX.Y.Z, X.Y.Z, and X.Y tags. Docker images include rust-silk at /usr/local/bin/rust-silk by default.

From this checkout

cargo install --path .

Quick start

# Transcribe directly, reading $CODEX_HOME/auth.json or ~/.codex/auth.json
codex-asr audio.wav

# Send a language hint upstream
codex-asr audio.wav --language zh

# JSON output for scripts
codex-asr audio.wav --json

# Extensionless input with explicit multipart content type
codex-asr raw-audio --content-type audio/wav

# Decode WeChat / SILK audio through an external rust-silk CLI first
codex-asr voice.silk --silk-decoder /path/to/rust-silk

Local REST / OpenAI SDK

Start a local Whisper-style REST endpoint:

codex-asr serve --api-key local_dev_key --host 127.0.0.1 --port 8788 --concurrency 16

Call it with an OpenAI-style multipart request:

curl http://127.0.0.1:8788/v1/audio/transcriptions \
  -H 'Authorization: Bearer local_dev_key' \
  -F model=whisper-1 \
  -F file=@audio.wav

OpenAI SDK examples live in examples/:

python3 -m pip install openai
CODEX_ASR_SERVER_KEY=local_dev_key \
  python3 examples/python_openai_sdk.py audio.wav

npm install openai
CODEX_ASR_SERVER_KEY=local_dev_key \
  node examples/node_openai_sdk.mjs audio.wav

Routes:

Route Notes
GET /healthz health check, no auth required
POST /v1/audio/transcriptions OpenAI Whisper-style route
POST /audio/transcriptions short alias

Multipart fields:

Field Handling
file required
model accepted for SDK compatibility, ignored
language forwarded to Codex /transcribe
response_format supports json, text, verbose_json
prompt, temperature, timestamp_granularities accepted for SDK compatibility, ignored

srt and vtt return HTTP 400 because the Codex endpoint does not return timestamps.

Public Docker Deployment

This deployment shape is intentionally small: a public port, HTTPS, and an application API key. deploy/compose.public.yml uses Caddy to obtain TLS certificates on ports 80/443 and reverse proxy to codex-asr serve inside the private Docker network. App auth is Authorization: Bearer <CODEX_ASR_SERVER_KEY>.

Prerequisites:

  • The domain A/AAAA record points at the server
  • The server firewall allows ports 80 and 443
  • A usable Codex / ChatGPT auth file exists on the server
  • A container-readable auth copy has been prepared
sudo install -d -m 750 /opt/codex-asr
sudo install -o 10001 -g 65534 -m 0400 ~/.codex/auth.json /opt/codex-asr/auth.json
cp deploy/env.public.example deploy/.env
$EDITOR deploy/.env
docker compose --env-file deploy/.env -f deploy/compose.public.yml up -d

Set at least these values in deploy/.env:

CODEX_ASR_DOMAIN=asr.example.com
CODEX_ASR_AUTH_FILE=/opt/codex-asr/auth.json
CODEX_ASR_SERVER_KEY=replace-with-a-long-random-api-key
CODEX_ASR_MAX_UPLOAD_MB=50
CODEX_ASR_REQUEST_TIMEOUT_SECONDS=300

Example requests:

curl https://asr.example.com/healthz

curl https://asr.example.com/v1/audio/transcriptions \
  -H "Authorization: Bearer replace-with-a-long-random-api-key" \
  -F model=whisper-1 \
  -F file=@audio.wav

Docker images include rust-silk by default, so .silk / WeChat voice files are decoded to temporary WAV inside the container before upload. Override CODEX_ASR_SILK_DECODER only when you want to use a custom decoder.

Commands

  • codex-asr <audio>: same as codex-asr transcribe <audio>
  • transcribe: upload one audio file and return text
  • serve: run the local OpenAI Whisper-style REST shim

Common options:

Option Notes
--bearer explicit ChatGPT bearer token; defaults to local Codex auth
--account-id override ChatGPT-Account-Id; usually decoded from the bearer token
--auth-file custom Codex auth file
--endpoint override the upstream /backend-api/transcribe URL
--proxy HTTPS proxy
--request-timeout-seconds total upstream request timeout, default 300
--connect-timeout-seconds upstream connect timeout, default 15
--language language hint, for example zh or en
--content-type audio content type override
--filename multipart filename override
--json print {"text":"..."}
--silk-decoder external rust-silk CLI path
--silk-sample-rate WAV sample rate for SILK decode, default 24000
--no-silk-decode upload .silk directly without local decoding

serve also supports:

Option Notes
--host bind host, default 127.0.0.1
--port bind port, default 8788
--api-key local REST API key
--no-api-key disable local REST auth, only for trusted loopback
--concurrency maximum concurrent upstream ASR requests, default 16
--max-upload-mb REST upload size limit, default 50 MiB

Auth model

By default, codex-asr reads:

  1. $CODEX_HOME/auth.json
  2. ~/.codex/auth.json

For the smallest explicit input surface, pass only a bearer token:

codex-asr audio.wav --bearer "$TOKEN"
CODEX_ASR_BEARER="$TOKEN" codex-asr audio.wav

ChatGPT-Account-Id is usually decoded from the JWT payload. Override it only when the token does not contain that claim:

codex-asr audio.wav --bearer "$TOKEN" --account-id acct_...

Environment variables:

Env Purpose
CODEX_ASR_BEARER default bearer token
CODEX_ASR_SERVER_KEY local REST API key
CODEX_ASR_PROXY HTTPS proxy
CODEX_ASR_REQUEST_TIMEOUT_SECONDS total upstream request timeout, default 300
CODEX_ASR_CONNECT_TIMEOUT_SECONDS upstream connect timeout, default 15
CODEX_ASR_MAX_UPLOAD_MB REST upload size limit, default 50 MiB
CODEX_ASR_SILK_DECODER external rust-silk CLI path; Docker defaults to /usr/local/bin/rust-silk

What it changes

  • Reads your local Codex / ChatGPT auth file, but does not write or refresh it
  • serve rereads the auth file before each transcription so an updated container auth copy takes effect without restarting
  • Uploads the audio file you provide to https://chatgpt.com/backend-api/transcribe
  • Decodes .silk / .slk inputs to temporary WAV files before upload by default
  • The REST server only wraps requests, normalizes SDK fields, limits concurrency, and converts responses
  • It does not persist transcripts unless your caller does so

Safety

codex-asr reuses a personal Codex / ChatGPT login token. Do not expose plain HTTP or an unauthenticated REST server; public deployment should use at least HTTPS and an application API key.

  • REST binds to 127.0.0.1 by default
  • The REST API key is local access control only; it is not an OpenAI or ChatGPT token
  • Use --no-api-key only on trusted loopback
  • deploy/compose.public.yml enables Caddy HTTPS and REST api-key auth by default
  • This endpoint is reverse-engineered from Codex Desktop behavior and may change without notice

Audio formats

The Codex Desktop endpoint appears to inspect the actual audio container, not only the multipart content type.

Formats tested successfully when uploaded directly:

Container / codec Suggested content type
WAV PCM audio/wav
MP3 audio/mpeg
M4A or MP4 AAC audio/mp4
FLAC audio/flac
Ogg Opus audio/ogg
WebM Opus audio/webm

Formats supported through local preprocessing:

Input Handling
SILK v3 (#!SILK_V3) decoded to temporary WAV with rust-silk
WeChat/Tencent SILK (0x02 + #!SILK_V3) decoded to temporary WAV with rust-silk

Formats rejected by the upstream endpoint during local direct-upload tests:

Format Result
ADTS AAC (.aac) HTTP 500, ASR API error
AIFF HTTP 500, ASR API error
CAF AAC HTTP 500, ASR API error
Raw PCM stream HTTP 500, ASR API error
SILK v3 (#!SILK_V3) HTTP 500, ASR API error
WeChat/Tencent SILK (0x02 + #!SILK_V3) HTTP 500, ASR API error

Files with no recognizable audio extension should be uploaded with a known --content-type; the CLI will add a matching multipart filename extension. You can override it explicitly:

codex-asr raw-audio --content-type audio/wav --filename voice.wav

Known limits

  • Empty files return HTTP 500 with Error in ASR API
  • One second of silence returns HTTP 200 with an empty transcript
  • Very short non-silent clips can return unstable text
  • prompt, temperature, and timestamp_granularities are accepted only for SDK compatibility
  • Local short-WAV probes reached 96 concurrent requests without 429/5xx, but that is not a public API contract
  • Keep long-audio REST usage bounded; the default is --concurrency 16
  • The REST server defaults to a 50 MiB upload limit and a 300 second upstream request timeout

Rust library

Library consumers that do not need the local REST server can avoid Axum/Tokio:

codex-asr = { version = "0.1", default-features = false }
use codex_asr::{CodexAsrClient, TranscribeOptions};

let client = CodexAsrClient::from_codex_home()?;
let result = client.transcribe_file("audio.wav", TranscribeOptions::default())?;
println!("{}", result.text);
# Ok::<(), Box<dyn std::error::Error>>(())

Platforms

  • CLI and library support macOS, Linux, and Windows
  • Prebuilt binaries are published for Apple Silicon macOS, Intel macOS, Linux x64, Linux ARM64, and Windows x64
  • serve is enabled by default; library users can disable it with default-features = false
  • SILK support depends on an external rust-silk CLI; regular binaries do not bundle a decoder, while Docker images include one by default

Run from source

cargo run -- --help
cargo run -- audio.wav --json
cargo run -- serve --api-key local_dev_key

Visual assets

  • The top logo.png was generated with gpt-image-2-skill
  • assets/codex.svg and assets/codex-color.svg are from the LobeHub Codex (OpenAI) icon
  • assets/codex-color-reference.png is rendered from assets/codex-color.svg and was passed as the --ref-image when generating logo.png. Passing the SVG directly was rejected by the upstream image-editing API because it only accepts JPEG, PNG, GIF, and WebP reference images.