<p align="center">
<img src="./logo.png" width="220" alt="codex-asr logo">
</p>
<h1 align="center">codex-asr</h1>
<p align="center">
Reuse your local Codex Desktop ChatGPT login for one-shot ASR.
</p>
<p align="center">
<a href="https://github.com/Wangnov/codex-asr/releases/latest"><img src="https://img.shields.io/github/v/release/Wangnov/codex-asr?logo=github" alt="Latest release"></a>
<a href="https://crates.io/crates/codex-asr"><img src="https://img.shields.io/crates/v/codex-asr?logo=rust" alt="Crates.io"></a>
<a href="https://docs.rs/codex-asr"><img src="https://img.shields.io/docsrs/codex-asr?logo=docs.rs" alt="docs.rs"></a>
<a href="https://github.com/Wangnov/codex-asr/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/Wangnov/codex-asr/ci.yml?branch=main&label=ci" alt="CI"></a>
<a href="https://github.com/Wangnov/codex-asr/actions/workflows/docker.yml"><img src="https://img.shields.io/github/actions/workflow/status/Wangnov/codex-asr/docker.yml?branch=main&label=docker" alt="Docker"></a>
<a href="https://github.com/Wangnov/codex-asr/actions/workflows/release.yml"><img src="https://img.shields.io/badge/release-cargo--dist-2ea44f" alt="cargo-dist release automation"></a>
<a href="https://github.com/Wangnov/codex-asr/blob/main/Cargo.toml"><img src="https://img.shields.io/badge/license-MIT-2ea44f" alt="MIT license"></a>
</p>
<p align="center">
<a href="#readme-cn">中文</a> · <a href="#readme-en">English</a>
</p>
<p align="center">
crates.io · cargo-binstall · Homebrew · Docker / GHCR · Rust library · local REST
</p>
---
<a id="readme-cn"></a>
# 中文
Codex.app 里有一个一次性语音输入接口:它把录音上传到
`https://chatgpt.com/backend-api/transcribe`,再拿回文本。这个接口不是
OpenAI Whisper API,也不是公开稳定 API,但它可以复用你已经登录在本机
Codex Desktop / ChatGPT 里的账号。
`codex-asr` 把这件事拆成一个 Rust CLI、一个可发布的 crate,以及一个本地
OpenAI Whisper 风格 REST 包装层。你可以直接转写音频文件,也可以让现有
OpenAI SDK 指向本机 `codex-asr serve`。
## 适合谁用
- 你已经在 Codex Desktop 里登录了 ChatGPT 账号
- 你想在脚本、CLI、Agent 或本地服务里复用 Codex App 的 ASR 能力
- 你只想提供最小输入面,例如一个本地音频文件,或显式传入一个 bearer token
- 你想把 `.silk` / 微信语音先经本机 `rust-silk` 解码,再上传转写
- 你接受这是逆向自 Codex Desktop 行为的本地自动化工具,而不是官方 API
## 安装
### Homebrew
```bash
brew tap wangnov/tap
brew install codex-asr
```
### cargo-binstall
```bash
cargo binstall codex-asr
```
### cargo install
```bash
cargo install codex-asr
```
### GitHub Release 安装脚本
```bash
curl --proto '=https' --tlsv1.2 -LsSf \
https://github.com/Wangnov/codex-asr/releases/latest/download/codex-asr-installer.sh \
| sh
```
PowerShell:
```powershell
### 直接下载二进制
从 [最新 GitHub Release](https://github.com/Wangnov/codex-asr/releases/latest)
下载对应平台的压缩包或安装脚本。
### Docker / GHCR
```bash
docker pull ghcr.io/wangnov/codex-asr:latest
docker run --rm ghcr.io/wangnov/codex-asr:latest --version
```
`main` 分支会通过 GitHub Actions 发布 `latest`、`main` 和 `sha-*`
标签;正式 release tag 会额外发布 `vX.Y.Z`、`X.Y.Z` 和 `X.Y` 标签。
Docker 镜像默认内置 `rust-silk`,路径是 `/usr/local/bin/rust-silk`。
### 从当前源码安装
```bash
cargo install --path .
```
## 快速上手
```bash
# 直接转写,认证默认读取 $CODEX_HOME/auth.json 或 ~/.codex/auth.json
codex-asr audio.wav
# 给上游 ASR 一个语言提示
codex-asr audio.wav --language zh
# JSON 输出,方便脚本消费
codex-asr audio.wav --json
# 扩展名缺失时显式告诉 multipart content type
codex-asr raw-audio --content-type audio/wav
# 微信 / SILK 语音先通过外部 rust-silk CLI 解码成临时 WAV
codex-asr voice.silk --silk-decoder /path/to/rust-silk
```
## 本地 REST / OpenAI SDK
启动一个本地 Whisper 风格 REST 端点:
```bash
codex-asr serve --api-key local_dev_key --host 127.0.0.1 --port 8788 --concurrency 16
```
然后用 OpenAI 风格 multipart 请求调用:
```bash
curl http://127.0.0.1:8788/v1/audio/transcriptions \
-H 'Authorization: Bearer local_dev_key' \
-F model=whisper-1 \
-F file=@audio.wav
```
OpenAI SDK 示例在 `examples/`:
```bash
python3 -m pip install openai
CODEX_ASR_SERVER_KEY=local_dev_key \
python3 examples/python_openai_sdk.py audio.wav
npm install openai
CODEX_ASR_SERVER_KEY=local_dev_key \
node examples/node_openai_sdk.mjs audio.wav
```
REST 路由:
| `GET /healthz` | 健康检查,不需要认证 |
| `POST /v1/audio/transcriptions` | OpenAI Whisper 风格路由 |
| `POST /audio/transcriptions` | 短别名 |
REST 字段兼容范围:
| `file` | 必填 |
| `model` | 为 SDK 兼容而接受,忽略 |
| `language` | 转发给 Codex `/transcribe` |
| `response_format` | 支持 `json`、`text`、`verbose_json` |
| `prompt`、`temperature`、`timestamp_granularities` | 为 SDK 兼容而接受,忽略 |
`srt` 和 `vtt` 会返回 HTTP 400,因为 Codex 这个端点不返回时间戳。
## Docker 公网部署
这个部署形态只承诺三件事:公网端口、HTTPS、api-key。`deploy/compose.public.yml`
用 Caddy 自动申请证书,把 80/443 暴露到公网,再反代到容器里的
`codex-asr serve`。应用层鉴权使用 `Authorization: Bearer <CODEX_ASR_SERVER_KEY>`。
前置条件:
- 域名 A/AAAA 记录已经指向服务器
- 服务器公网防火墙放行 80 和 443
- 服务器上已有可用的 Codex / ChatGPT auth 文件
- 已准备一个容器用户可读的 auth 副本
```bash
sudo install -d -m 750 /opt/codex-asr
sudo install -o 10001 -g 65534 -m 0400 ~/.codex/auth.json /opt/codex-asr/auth.json
cp deploy/env.public.example deploy/.env
$EDITOR deploy/.env
docker compose --env-file deploy/.env -f deploy/compose.public.yml up -d
```
`deploy/.env` 里至少要设置:
```bash
CODEX_ASR_DOMAIN=asr.example.com
CODEX_ASR_AUTH_FILE=/opt/codex-asr/auth.json
CODEX_ASR_SERVER_KEY=replace-with-a-long-random-api-key
CODEX_ASR_MAX_UPLOAD_MB=50
CODEX_ASR_REQUEST_TIMEOUT_SECONDS=300
```
调用示例:
```bash
curl https://asr.example.com/healthz
curl https://asr.example.com/v1/audio/transcriptions \
-H "Authorization: Bearer replace-with-a-long-random-api-key" \
-F model=whisper-1 \
-F file=@audio.wav
```
Docker 镜像默认内置 `rust-silk`,所以 `.silk` / 微信语音会在容器内先解码成
临时 WAV 再上传。只有要替换成自定义解码器时,才需要覆盖
`CODEX_ASR_SILK_DECODER`。
## 命令说明
- `codex-asr <audio>`:默认等同于 `codex-asr transcribe <audio>`
- `transcribe`:上传一个音频文件并返回文本
- `serve`:启动本地 OpenAI Whisper 风格 REST 包装层
常用选项:
| `--bearer` | 显式传入 ChatGPT bearer token;默认读取本机 Codex auth |
| `--account-id` | 覆盖 `ChatGPT-Account-Id`;通常会从 bearer token 自动解析 |
| `--auth-file` | 指定 Codex auth 文件 |
| `--endpoint` | 覆盖上游 `/backend-api/transcribe` URL |
| `--proxy` | 指定 HTTPS 代理 |
| `--request-timeout-seconds` | 上游请求总超时,默认 `300` |
| `--connect-timeout-seconds` | 上游连接超时,默认 `15` |
| `--language` | 语言提示,例如 `zh` 或 `en` |
| `--content-type` | 覆盖音频 content type |
| `--filename` | 覆盖 multipart 文件名 |
| `--json` | 输出 `{"text":"..."}` |
| `--silk-decoder` | 指定外部 `rust-silk` CLI |
| `--silk-sample-rate` | SILK 解码成 WAV 时的采样率,默认 `24000` |
| `--no-silk-decode` | 直接上传 `.silk`,不做本地解码 |
`serve` 额外支持:
| `--host` | 绑定地址,默认 `127.0.0.1` |
| `--port` | 绑定端口,默认 `8788` |
| `--api-key` | 本地 REST API key |
| `--no-api-key` | 关闭本地 REST 认证,仅限可信 loopback |
| `--concurrency` | 上游 ASR 并发上限,默认 `16` |
| `--max-upload-mb` | REST 上传体积上限,默认 `50` MiB |
## 认证模型
默认情况下,`codex-asr` 会读取:
1. `$CODEX_HOME/auth.json`
2. `~/.codex/auth.json`
如果你想把输入面压到最小,可以只传 bearer:
```bash
codex-asr audio.wav --bearer "$TOKEN"
CODEX_ASR_BEARER="$TOKEN" codex-asr audio.wav
```
`ChatGPT-Account-Id` 通常会从 bearer token 的 JWT payload 中自动解析。只有当
token 里没有这个 claim 时,才需要手动传:
```bash
codex-asr audio.wav --bearer "$TOKEN" --account-id acct_...
```
可用环境变量:
| `CODEX_ASR_BEARER` | 默认 bearer token |
| `CODEX_ASR_SERVER_KEY` | 本地 REST API key |
| `CODEX_ASR_PROXY` | HTTPS 代理 |
| `CODEX_ASR_REQUEST_TIMEOUT_SECONDS` | 上游请求总超时,默认 `300` |
| `CODEX_ASR_CONNECT_TIMEOUT_SECONDS` | 上游连接超时,默认 `15` |
| `CODEX_ASR_MAX_UPLOAD_MB` | REST 上传体积上限,默认 `50` MiB |
| `CODEX_ASR_SILK_DECODER` | 外部 `rust-silk` CLI 路径;Docker 镜像默认 `/usr/local/bin/rust-silk` |
## 它会改什么
- 读取本机 Codex / ChatGPT auth 文件,但不会写入或刷新它
- `serve` 会在每次转写前重新读取 auth 文件,以便容器中的 auth 副本更新后无需重启服务
- 上传你指定的音频文件到 `https://chatgpt.com/backend-api/transcribe`
- `.silk` / `.slk` 输入默认先在本地解码成临时 WAV,再上传 WAV
- REST server 只做本地包装、字段兼容、并发限制和响应格式转换
- 不会保存转写结果,除非你的调用方自己保存
## 安全边界
`codex-asr` 会复用你的个人 Codex / ChatGPT 登录 token。不要裸露 HTTP 或无
api-key 的 REST server;公网部署至少使用 HTTPS 和应用层 api-key。
- REST 默认绑定 `127.0.0.1`
- 本地 REST key 只是本地访问控制,不是 OpenAI 或 ChatGPT token
- 只有在可信本机回环环境里才使用 `--no-api-key`
- `deploy/compose.public.yml` 的公网形态默认启用 Caddy HTTPS 和 REST api-key
- 这个端点是从 Codex Desktop 行为逆向出来的,可能随 Codex App 更新而变化
## 音频格式
Codex Desktop 的上游端点会检查真实音频容器,不只看 multipart content type。
本地测试中可直接上传的格式:
| WAV PCM | `audio/wav` |
| MP3 | `audio/mpeg` |
| M4A 或 MP4 AAC | `audio/mp4` |
| FLAC | `audio/flac` |
| Ogg Opus | `audio/ogg` |
| WebM Opus | `audio/webm` |
通过本地预处理支持的格式:
| SILK v3 (`#!SILK_V3`) | 用外部 `rust-silk` 解码成临时 WAV |
| WeChat/Tencent SILK (`0x02 + #!SILK_V3`) | 用外部 `rust-silk` 解码成临时 WAV |
本地测试中直接上传会被上游拒绝的格式:
| ADTS AAC (`.aac`) | HTTP 500, ASR API error |
| AIFF | HTTP 500, ASR API error |
| CAF AAC | HTTP 500, ASR API error |
| Raw PCM stream | HTTP 500, ASR API error |
| SILK v3 (`#!SILK_V3`) | HTTP 500, ASR API error |
| WeChat/Tencent SILK (`0x02 + #!SILK_V3`) | HTTP 500, ASR API error |
如果文件名没有可识别音频扩展名,请传 `--content-type`;CLI 会补一个合适的
multipart 文件名。也可以显式传:
```bash
codex-asr raw-audio --content-type audio/wav --filename voice.wav
```
## 已知边界
- 空文件会返回 HTTP 500 `Error in ASR API`
- 一秒静音 WAV 会返回 HTTP 200 和空文本
- 非常短的非静音片段可能返回不稳定文本
- `prompt`、`temperature`、`timestamp_granularities` 只为 SDK 兼容而接受,不会影响上游
- 本地短 WAV 并发测试到 96 个请求没有出现 429/5xx,但这不是公开 API 承诺
- 建议 REST wrapper 对长音频保持有界并发,默认 `--concurrency 16`
- REST server 默认限制上传体积为 `50` MiB,上游请求总超时为 `300` 秒
## Rust library
不需要 REST server 的库用户可以关闭默认特性,避免引入 Axum/Tokio:
```toml
codex-asr = { version = "0.1", default-features = false }
```
```rust
use codex_asr::{CodexAsrClient, TranscribeOptions};
let client = CodexAsrClient::from_codex_home()?;
let result = client.transcribe_file("audio.wav", TranscribeOptions::default())?;
println!("{}", result.text);
# Ok::<(), Box<dyn std::error::Error>>(())
```
## 平台支持
- CLI 和 library 支持 macOS、Linux、Windows
- Release 二进制覆盖 Apple Silicon macOS、Intel macOS、Linux x64、Linux ARM64、Windows x64
- `serve` 功能默认开启;作为库使用时可通过 `default-features = false` 关闭
- SILK 支持依赖外部 `rust-silk` CLI;普通二进制不内置解码器,Docker 镜像默认内置
## 从源码运行
```bash
cargo run -- --help
cargo run -- audio.wav --json
cargo run -- serve --api-key local_dev_key
```
## 视觉资产
- 顶部 `logo.png` 由 `gpt-image-2-skill` 生成
- `assets/codex.svg` 与 `assets/codex-color.svg` 来自
[LobeHub Codex (OpenAI) icon](https://lobehub.com/icons/codex)
- `assets/codex-color-reference.png` 是从 `assets/codex-color.svg` 渲染出的参考图;
生成 `logo.png` 时作为 `--ref-image` 传入。直接上传 SVG 会被上游图片编辑
接口拒绝,因为它只接受 JPEG、PNG、GIF、WebP 参考图。
---
<a id="readme-en"></a>
# English
Codex.app has a one-shot dictation endpoint: it uploads recorded audio to
`https://chatgpt.com/backend-api/transcribe` and receives text back. This is not
the OpenAI Whisper API and it is not a public stable API, but it can reuse the
ChatGPT account that is already signed in on your local Codex Desktop app.
`codex-asr` packages that behavior as a Rust CLI, a publishable crate, and a
local OpenAI Whisper-style REST shim. You can transcribe files directly, or point
existing OpenAI SDK clients at `codex-asr serve` on localhost.
## Who this is for
- You already use Codex Desktop with a signed-in ChatGPT account
- You want to reuse Codex App ASR from scripts, CLIs, agents, or local services
- You want a tiny input surface: a local audio file, or an explicit bearer token
- You want `.silk` / WeChat voice files decoded through your local `rust-silk` CLI
- You accept that this is a reverse-engineered local automation surface, not an official API
## Install
### Homebrew
```bash
brew tap wangnov/tap
brew install codex-asr
```
### cargo-binstall
```bash
cargo binstall codex-asr
```
### cargo install
```bash
cargo install codex-asr
```
### GitHub Release installer
```bash
curl --proto '=https' --tlsv1.2 -LsSf \
https://github.com/Wangnov/codex-asr/releases/latest/download/codex-asr-installer.sh \
| sh
```
PowerShell:
```powershell
### Direct binary download
Download the matching archive or installer from the
[latest GitHub Release](https://github.com/Wangnov/codex-asr/releases/latest).
### Docker / GHCR
```bash
docker pull ghcr.io/wangnov/codex-asr:latest
docker run --rm ghcr.io/wangnov/codex-asr:latest --version
```
The `main` branch publishes `latest`, `main`, and `sha-*` tags through GitHub
Actions. Release tags additionally publish `vX.Y.Z`, `X.Y.Z`, and `X.Y` tags.
Docker images include `rust-silk` at `/usr/local/bin/rust-silk` by default.
### From this checkout
```bash
cargo install --path .
```
## Quick start
```bash
# Transcribe directly, reading $CODEX_HOME/auth.json or ~/.codex/auth.json
codex-asr audio.wav
# Send a language hint upstream
codex-asr audio.wav --language zh
# JSON output for scripts
codex-asr audio.wav --json
# Extensionless input with explicit multipart content type
codex-asr raw-audio --content-type audio/wav
# Decode WeChat / SILK audio through an external rust-silk CLI first
codex-asr voice.silk --silk-decoder /path/to/rust-silk
```
## Local REST / OpenAI SDK
Start a local Whisper-style REST endpoint:
```bash
codex-asr serve --api-key local_dev_key --host 127.0.0.1 --port 8788 --concurrency 16
```
Call it with an OpenAI-style multipart request:
```bash
curl http://127.0.0.1:8788/v1/audio/transcriptions \
-H 'Authorization: Bearer local_dev_key' \
-F model=whisper-1 \
-F file=@audio.wav
```
OpenAI SDK examples live in `examples/`:
```bash
python3 -m pip install openai
CODEX_ASR_SERVER_KEY=local_dev_key \
python3 examples/python_openai_sdk.py audio.wav
npm install openai
CODEX_ASR_SERVER_KEY=local_dev_key \
node examples/node_openai_sdk.mjs audio.wav
```
Routes:
| `GET /healthz` | health check, no auth required |
| `POST /v1/audio/transcriptions` | OpenAI Whisper-style route |
| `POST /audio/transcriptions` | short alias |
Multipart fields:
| `file` | required |
| `model` | accepted for SDK compatibility, ignored |
| `language` | forwarded to Codex `/transcribe` |
| `response_format` | supports `json`, `text`, `verbose_json` |
| `prompt`, `temperature`, `timestamp_granularities` | accepted for SDK compatibility, ignored |
`srt` and `vtt` return HTTP 400 because the Codex endpoint does not return
timestamps.
## Public Docker Deployment
This deployment shape is intentionally small: a public port, HTTPS, and an
application API key. `deploy/compose.public.yml` uses Caddy to obtain TLS
certificates on ports 80/443 and reverse proxy to `codex-asr serve` inside the
private Docker network. App auth is `Authorization: Bearer <CODEX_ASR_SERVER_KEY>`.
Prerequisites:
- The domain A/AAAA record points at the server
- The server firewall allows ports 80 and 443
- A usable Codex / ChatGPT auth file exists on the server
- A container-readable auth copy has been prepared
```bash
sudo install -d -m 750 /opt/codex-asr
sudo install -o 10001 -g 65534 -m 0400 ~/.codex/auth.json /opt/codex-asr/auth.json
cp deploy/env.public.example deploy/.env
$EDITOR deploy/.env
docker compose --env-file deploy/.env -f deploy/compose.public.yml up -d
```
Set at least these values in `deploy/.env`:
```bash
CODEX_ASR_DOMAIN=asr.example.com
CODEX_ASR_AUTH_FILE=/opt/codex-asr/auth.json
CODEX_ASR_SERVER_KEY=replace-with-a-long-random-api-key
CODEX_ASR_MAX_UPLOAD_MB=50
CODEX_ASR_REQUEST_TIMEOUT_SECONDS=300
```
Example requests:
```bash
curl https://asr.example.com/healthz
curl https://asr.example.com/v1/audio/transcriptions \
-H "Authorization: Bearer replace-with-a-long-random-api-key" \
-F model=whisper-1 \
-F file=@audio.wav
```
Docker images include `rust-silk` by default, so `.silk` / WeChat voice files
are decoded to temporary WAV inside the container before upload. Override
`CODEX_ASR_SILK_DECODER` only when you want to use a custom decoder.
## Commands
- `codex-asr <audio>`: same as `codex-asr transcribe <audio>`
- `transcribe`: upload one audio file and return text
- `serve`: run the local OpenAI Whisper-style REST shim
Common options:
| `--bearer` | explicit ChatGPT bearer token; defaults to local Codex auth |
| `--account-id` | override `ChatGPT-Account-Id`; usually decoded from the bearer token |
| `--auth-file` | custom Codex auth file |
| `--endpoint` | override the upstream `/backend-api/transcribe` URL |
| `--proxy` | HTTPS proxy |
| `--request-timeout-seconds` | total upstream request timeout, default `300` |
| `--connect-timeout-seconds` | upstream connect timeout, default `15` |
| `--language` | language hint, for example `zh` or `en` |
| `--content-type` | audio content type override |
| `--filename` | multipart filename override |
| `--json` | print `{"text":"..."}` |
| `--silk-decoder` | external `rust-silk` CLI path |
| `--silk-sample-rate` | WAV sample rate for SILK decode, default `24000` |
| `--no-silk-decode` | upload `.silk` directly without local decoding |
`serve` also supports:
| `--host` | bind host, default `127.0.0.1` |
| `--port` | bind port, default `8788` |
| `--api-key` | local REST API key |
| `--no-api-key` | disable local REST auth, only for trusted loopback |
| `--concurrency` | maximum concurrent upstream ASR requests, default `16` |
| `--max-upload-mb` | REST upload size limit, default `50` MiB |
## Auth model
By default, `codex-asr` reads:
1. `$CODEX_HOME/auth.json`
2. `~/.codex/auth.json`
For the smallest explicit input surface, pass only a bearer token:
```bash
codex-asr audio.wav --bearer "$TOKEN"
CODEX_ASR_BEARER="$TOKEN" codex-asr audio.wav
```
`ChatGPT-Account-Id` is usually decoded from the JWT payload. Override it only
when the token does not contain that claim:
```bash
codex-asr audio.wav --bearer "$TOKEN" --account-id acct_...
```
Environment variables:
| `CODEX_ASR_BEARER` | default bearer token |
| `CODEX_ASR_SERVER_KEY` | local REST API key |
| `CODEX_ASR_PROXY` | HTTPS proxy |
| `CODEX_ASR_REQUEST_TIMEOUT_SECONDS` | total upstream request timeout, default `300` |
| `CODEX_ASR_CONNECT_TIMEOUT_SECONDS` | upstream connect timeout, default `15` |
| `CODEX_ASR_MAX_UPLOAD_MB` | REST upload size limit, default `50` MiB |
| `CODEX_ASR_SILK_DECODER` | external `rust-silk` CLI path; Docker defaults to `/usr/local/bin/rust-silk` |
## What it changes
- Reads your local Codex / ChatGPT auth file, but does not write or refresh it
- `serve` rereads the auth file before each transcription so an updated container auth copy takes effect without restarting
- Uploads the audio file you provide to `https://chatgpt.com/backend-api/transcribe`
- Decodes `.silk` / `.slk` inputs to temporary WAV files before upload by default
- The REST server only wraps requests, normalizes SDK fields, limits concurrency, and converts responses
- It does not persist transcripts unless your caller does so
## Safety
`codex-asr` reuses a personal Codex / ChatGPT login token. Do not expose plain
HTTP or an unauthenticated REST server; public deployment should use at least
HTTPS and an application API key.
- REST binds to `127.0.0.1` by default
- The REST API key is local access control only; it is not an OpenAI or ChatGPT token
- Use `--no-api-key` only on trusted loopback
- `deploy/compose.public.yml` enables Caddy HTTPS and REST api-key auth by default
- This endpoint is reverse-engineered from Codex Desktop behavior and may change without notice
## Audio formats
The Codex Desktop endpoint appears to inspect the actual audio container, not
only the multipart content type.
Formats tested successfully when uploaded directly:
| WAV PCM | `audio/wav` |
| MP3 | `audio/mpeg` |
| M4A or MP4 AAC | `audio/mp4` |
| FLAC | `audio/flac` |
| Ogg Opus | `audio/ogg` |
| WebM Opus | `audio/webm` |
Formats supported through local preprocessing:
| SILK v3 (`#!SILK_V3`) | decoded to temporary WAV with `rust-silk` |
| WeChat/Tencent SILK (`0x02 + #!SILK_V3`) | decoded to temporary WAV with `rust-silk` |
Formats rejected by the upstream endpoint during local direct-upload tests:
| ADTS AAC (`.aac`) | HTTP 500, ASR API error |
| AIFF | HTTP 500, ASR API error |
| CAF AAC | HTTP 500, ASR API error |
| Raw PCM stream | HTTP 500, ASR API error |
| SILK v3 (`#!SILK_V3`) | HTTP 500, ASR API error |
| WeChat/Tencent SILK (`0x02 + #!SILK_V3`) | HTTP 500, ASR API error |
Files with no recognizable audio extension should be uploaded with a known
`--content-type`; the CLI will add a matching multipart filename extension. You
can override it explicitly:
```bash
codex-asr raw-audio --content-type audio/wav --filename voice.wav
```
## Known limits
- Empty files return HTTP 500 with `Error in ASR API`
- One second of silence returns HTTP 200 with an empty transcript
- Very short non-silent clips can return unstable text
- `prompt`, `temperature`, and `timestamp_granularities` are accepted only for SDK compatibility
- Local short-WAV probes reached 96 concurrent requests without 429/5xx, but that is not a public API contract
- Keep long-audio REST usage bounded; the default is `--concurrency 16`
- The REST server defaults to a `50` MiB upload limit and a `300` second upstream request timeout
## Rust library
Library consumers that do not need the local REST server can avoid Axum/Tokio:
```toml
codex-asr = { version = "0.1", default-features = false }
```
```rust
use codex_asr::{CodexAsrClient, TranscribeOptions};
let client = CodexAsrClient::from_codex_home()?;
let result = client.transcribe_file("audio.wav", TranscribeOptions::default())?;
println!("{}", result.text);
# Ok::<(), Box<dyn std::error::Error>>(())
```
## Platforms
- CLI and library support macOS, Linux, and Windows
- Prebuilt binaries are published for Apple Silicon macOS, Intel macOS, Linux x64, Linux ARM64, and Windows x64
- `serve` is enabled by default; library users can disable it with `default-features = false`
- SILK support depends on an external `rust-silk` CLI; regular binaries do not bundle a decoder, while Docker images include one by default
## Run from source
```bash
cargo run -- --help
cargo run -- audio.wav --json
cargo run -- serve --api-key local_dev_key
```
## Visual assets
- The top `logo.png` was generated with `gpt-image-2-skill`
- `assets/codex.svg` and `assets/codex-color.svg` are from the
[LobeHub Codex (OpenAI) icon](https://lobehub.com/icons/codex)
- `assets/codex-color-reference.png` is rendered from `assets/codex-color.svg`
and was passed as the `--ref-image` when generating `logo.png`. Passing the
SVG directly was rejected by the upstream image-editing API because it only
accepts JPEG, PNG, GIF, and WebP reference images.