codex-asr 0.1.0

Unofficial Codex Desktop ASR client that reuses local ChatGPT auth for one-shot transcription
Documentation

codex-asr

Unofficial Rust CLI/library for Codex Desktop's one-shot dictation ASR endpoint.

It reuses a local Codex ChatGPT login from $CODEX_HOME/auth.json or ~/.codex/auth.json, then uploads an audio file to:

https://chatgpt.com/backend-api/transcribe

This is not an official OpenAI API surface. Treat it as a local automation tool for an already-signed-in Codex Desktop environment.

Safety

codex-asr reuses a personal Codex/ChatGPT login token. Do not expose it, or a REST server backed by it, to the public internet.

  • Bind REST servers to loopback by default (127.0.0.1).
  • Use --api-key or CODEX_ASR_SERVER_KEY for the local REST wrapper.
  • Treat the REST API key as local access control only; it is not an upstream OpenAI or ChatGPT token.
  • Avoid --no-api-key unless the server is reachable only by trusted local processes.
  • This endpoint is reverse-engineered from Codex Desktop behavior and may change without notice.

Install From Source

cargo install --path .

When published:

cargo install codex-asr

GitHub Release installers are generated with cargo-dist:

curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/wangnov/codex-asr/releases/download/v0.1.0/codex-asr-installer.sh \
  | sh

PowerShell:

powershell -ExecutionPolicy ByPass -c "irm https://github.com/wangnov/codex-asr/releases/download/v0.1.0/codex-asr-installer.ps1 | iex"

Library consumers that do not need the local REST server can avoid the Axum/Tokio server dependencies:

codex-asr = { version = "0.1", default-features = false }

CLI

codex-asr audio.wav
codex-asr transcribe audio.wav --language zh
codex-asr audio.wav --content-type audio/webm --json
codex-asr raw-audio --content-type audio/wav
codex-asr voice.silk --silk-decoder /path/to/rust-silk

Auth defaults to local Codex auth. For the smallest explicit input surface, pass only a bearer token:

codex-asr audio.wav --bearer "$TOKEN"
CODEX_ASR_BEARER="$TOKEN" codex-asr audio.wav

ChatGPT-Account-Id is decoded from the bearer token when possible. Override it only if your token does not contain that claim:

codex-asr audio.wav --bearer "$TOKEN" --account-id acct_...

.silk and .slk inputs are decoded with an external rust-silk CLI before upload. The decoder is resolved in this order:

  1. --silk-decoder <path>
  2. CODEX_ASR_SILK_DECODER
  3. rust-silk on PATH
  4. $HOME/rust-silk/target/release/rust-silk
  5. $HOME/rust-silk/target/debug/rust-silk

The default SILK decode sample rate is 24000 Hz. Override it with:

codex-asr voice.silk --silk-sample-rate 16000

The backend is sensitive to multipart filenames. If an input path has no useful audio extension but you pass a known --content-type, codex-asr sends a synthetic filename such as raw-audio.wav. You can override it explicitly:

codex-asr raw-audio --content-type audio/wav --filename voice.wav

REST Server

codex-asr can also serve a small OpenAI Whisper-compatible REST surface:

codex-asr serve --api-key local_dev_key --host 127.0.0.1 --port 8788 --concurrency 16

Then call it with an OpenAI-style multipart request:

curl http://127.0.0.1:8788/v1/audio/transcriptions \
  -H 'Authorization: Bearer local_dev_key' \
  -F model=whisper-1 \
  -F file=@audio.wav

Implemented routes:

Route Notes
GET /healthz no auth required
POST /v1/audio/transcriptions OpenAI-style route
POST /audio/transcriptions short alias

Implemented multipart fields:

Field Handling
file required
model accepted for SDK compatibility, ignored
language forwarded to Codex /transcribe
response_format supports json, text, verbose_json
prompt, temperature, timestamp_granularities accepted and ignored

srt and vtt response formats return HTTP 400 because the Codex endpoint does not provide timestamps. REST auth defaults to CODEX_ASR_SERVER_KEY or --api-key; use --no-api-key only on trusted loopback.

OpenAI SDK examples live in examples/:

python3 -m pip install openai
CODEX_ASR_SERVER_KEY=local_dev_key \
  python3 examples/python_openai_sdk.py audio.wav

npm install openai
CODEX_ASR_SERVER_KEY=local_dev_key \
  node examples/node_openai_sdk.mjs audio.wav

The Python example disables environment proxy discovery for the local SDK client because some systems route localhost traffic through a proxy unless trust_env is disabled.

Library

use codex_asr::{CodexAsrClient, TranscribeOptions};

let client = CodexAsrClient::from_codex_home()?;
let result = client.transcribe_file("audio.wav", TranscribeOptions::default())?;
println!("{}", result.text);
# Ok::<(), Box<dyn std::error::Error>>(())

Audio Formats

The Codex Desktop endpoint appears to inspect the actual audio container, not only the multipart content type.

These formats were tested successfully when uploaded directly:

Container / codec Suggested content type
WAV PCM audio/wav
MP3 audio/mpeg
M4A or MP4 AAC audio/mp4
FLAC audio/flac
Ogg Opus audio/ogg
WebM Opus audio/webm

Files with no recognizable audio extension should be uploaded with a known --content-type; the CLI will add a matching multipart filename extension.

These formats are supported by the codex-asr CLI through local preprocessing:

Input Handling
SILK v3 (#!SILK_V3) decoded to temporary WAV with rust-silk
WeChat/Tencent SILK (0x02 + #!SILK_V3) decoded to temporary WAV with rust-silk

These were rejected by the endpoint during local testing when uploaded directly:

Format Result
ADTS AAC (.aac) HTTP 500, ASR API error
AIFF HTTP 500, ASR API error
CAF AAC HTTP 500, ASR API error
Raw PCM stream HTTP 500, ASR API error
SILK v3 (#!SILK_V3) HTTP 500, ASR API error
WeChat/Tencent SILK (0x02 + #!SILK_V3) HTTP 500, ASR API error

Endpoint Notes

Local probes against the Codex Desktop endpoint showed these practical edges:

  • Empty files return HTTP 500 with Error in ASR API.
  • One second of silence returns HTTP 200 with an empty transcript.
  • Very short non-silent clips can return unstable text; avoid treating tiny snippets as reliable.
  • A missing or misleading multipart filename extension can make an otherwise valid audio file fail. codex-asr compensates when --content-type is known.
  • Parallel batches up to 96 short WAV uploads succeeded locally without 429 or 5xx responses, but this is not a public API contract. Keep any REST wrapper concurrency bounded, especially for longer audio.

Shape

Default shape is CLI + library crate + local REST wrapper. The REST wrapper is kept local-first because this tool handles a user bearer token.