codex-asr

Unofficial Rust CLI/library for Codex Desktop's one-shot dictation ASR endpoint.

It reuses a local Codex ChatGPT login from $CODEX_HOME/auth.json or ~/.codex/auth.json, then uploads an audio file to:

https://chatgpt.com/backend-api/transcribe

This is not an official OpenAI API surface. Treat it as a local automation tool for an already-signed-in Codex Desktop environment.

Safety

codex-asr reuses a personal Codex/ChatGPT login token. Do not expose it, or a REST server backed by it, to the public internet.

Bind REST servers to loopback by default (127.0.0.1).
Use --api-key or CODEX_ASR_SERVER_KEY for the local REST wrapper.
Treat the REST API key as local access control only; it is not an upstream OpenAI or ChatGPT token.
Avoid --no-api-key unless the server is reachable only by trusted local processes.
This endpoint is reverse-engineered from Codex Desktop behavior and may change without notice.

Install From Source

cargo install --path .

When published:

cargo install codex-asr

GitHub Release installers are generated with cargo-dist:

curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/wangnov/codex-asr/releases/download/v0.1.0/codex-asr-installer.sh \
  | sh

PowerShell:

powershell -ExecutionPolicy ByPass -c "irm https://github.com/wangnov/codex-asr/releases/download/v0.1.0/codex-asr-installer.ps1 | iex"

Library consumers that do not need the local REST server can avoid the Axum/Tokio server dependencies:

codex-asr = { version = "0.1", default-features = false }

CLI

codex-asr audio.wav
codex-asr transcribe audio.wav --language zh
codex-asr audio.wav --content-type audio/webm --json
codex-asr raw-audio --content-type audio/wav
codex-asr voice.silk --silk-decoder /path/to/rust-silk

Auth defaults to local Codex auth. For the smallest explicit input surface, pass only a bearer token:

codex-asr audio.wav --bearer "$TOKEN"
CODEX_ASR_BEARER="$TOKEN" codex-asr audio.wav

ChatGPT-Account-Id is decoded from the bearer token when possible. Override it only if your token does not contain that claim:

codex-asr audio.wav --bearer "$TOKEN" --account-id acct_...

.silk and .slk inputs are decoded with an external rust-silk CLI before upload. The decoder is resolved in this order:

--silk-decoder <path>
CODEX_ASR_SILK_DECODER
rust-silk on PATH
$HOME/rust-silk/target/release/rust-silk
$HOME/rust-silk/target/debug/rust-silk

The default SILK decode sample rate is 24000 Hz. Override it with:

codex-asr voice.silk --silk-sample-rate 16000

The backend is sensitive to multipart filenames. If an input path has no useful audio extension but you pass a known --content-type, codex-asr sends a synthetic filename such as raw-audio.wav. You can override it explicitly:

codex-asr raw-audio --content-type audio/wav --filename voice.wav

REST Server

codex-asr can also serve a small OpenAI Whisper-compatible REST surface:

codex-asr serve --api-key local_dev_key --host 127.0.0.1 --port 8788 --concurrency 16

Then call it with an OpenAI-style multipart request:

curl http://127.0.0.1:8788/v1/audio/transcriptions \
  -H 'Authorization: Bearer local_dev_key' \
  -F model=whisper-1 \
  -F file=@audio.wav

Implemented routes:

Route	Notes
`GET /healthz`	no auth required
`POST /v1/audio/transcriptions`	OpenAI-style route
`POST /audio/transcriptions`	short alias

Implemented multipart fields:

Field	Handling
`file`	required
`model`	accepted for SDK compatibility, ignored
`language`	forwarded to Codex `/transcribe`
`response_format`	supports `json`, `text`, `verbose_json`
`prompt`, `temperature`, `timestamp_granularities`	accepted and ignored

srt and vtt response formats return HTTP 400 because the Codex endpoint does not provide timestamps. REST auth defaults to CODEX_ASR_SERVER_KEY or --api-key; use --no-api-key only on trusted loopback.

OpenAI SDK examples live in examples/:

python3 -m pip install openai
CODEX_ASR_SERVER_KEY=local_dev_key \
  python3 examples/python_openai_sdk.py audio.wav

npm install openai
CODEX_ASR_SERVER_KEY=local_dev_key \
  node examples/node_openai_sdk.mjs audio.wav

The Python example disables environment proxy discovery for the local SDK client because some systems route localhost traffic through a proxy unless trust_env is disabled.

Library

use codex_asr::{CodexAsrClient, TranscribeOptions};

let client = CodexAsrClient::from_codex_home()?;
let result = client.transcribe_file("audio.wav", TranscribeOptions::default())?;
println!("{}", result.text);
# Ok::<(), Box<dyn std::error::Error>>(())

Audio Formats

The Codex Desktop endpoint appears to inspect the actual audio container, not only the multipart content type.

These formats were tested successfully when uploaded directly:

Container / codec	Suggested content type
WAV PCM	`audio/wav`
MP3	`audio/mpeg`
M4A or MP4 AAC	`audio/mp4`
FLAC	`audio/flac`
Ogg Opus	`audio/ogg`
WebM Opus	`audio/webm`

Files with no recognizable audio extension should be uploaded with a known --content-type; the CLI will add a matching multipart filename extension.

These formats are supported by the codex-asr CLI through local preprocessing:

Input	Handling
SILK v3 (`#!SILK_V3`)	decoded to temporary WAV with `rust-silk`
WeChat/Tencent SILK (`0x02 + #!SILK_V3`)	decoded to temporary WAV with `rust-silk`

These were rejected by the endpoint during local testing when uploaded directly:

Format	Result
ADTS AAC (`.aac`)	HTTP 500, ASR API error
AIFF	HTTP 500, ASR API error
CAF AAC	HTTP 500, ASR API error
Raw PCM stream	HTTP 500, ASR API error
SILK v3 (`#!SILK_V3`)	HTTP 500, ASR API error
WeChat/Tencent SILK (`0x02 + #!SILK_V3`)	HTTP 500, ASR API error

Endpoint Notes

Local probes against the Codex Desktop endpoint showed these practical edges:

Empty files return HTTP 500 with Error in ASR API.
One second of silence returns HTTP 200 with an empty transcript.
Very short non-silent clips can return unstable text; avoid treating tiny snippets as reliable.
A missing or misleading multipart filename extension can make an otherwise valid audio file fail. codex-asr compensates when --content-type is known.
Parallel batches up to 96 short WAV uploads succeeded locally without 429 or 5xx responses, but this is not a public API contract. Keep any REST wrapper concurrency bounded, especially for longer audio.

Shape

Default shape is CLI + library crate + local REST wrapper. The REST wrapper is kept local-first because this tool handles a user bearer token.

codex-asr 0.1.0