transcribe-cli 0.0.6

Native Rust CLI transcription pipeline with GigaAM v3 ONNX
transcribe-cli-0.0.6 is not a library.

transcribe-cli

transcribe-cli is a native Rust transcription tool built around ONNX Runtime.

Crate:

  • crates.io: transcribe-cli
  • repository: https://github.com/qwertyu1opz/transcribe-cli

Current backends:

  • ru -> GigaAM v3 e2e CTC
  • en -> Parakeet TDT v2

It supports:

  • local audio and video files
  • http/https media URLs
  • chunked transcript output with --stream
  • live URL transcription with --live
  • optional lightweight VAD for live mode with --vad
  • REST server mode with --server
  • CPU and NVIDIA GPU execution

REST documentation:

Requirements

  • Rust 1.85+
  • Linux is the primary target
  • no external ffmpeg dependency is required

For GPU runs:

  • install with --features cuda
  • NVIDIA driver must be available
  • the project downloads the required ONNX Runtime CUDA libraries into its sandbox automatically

Install

From crates.io:

cargo install transcribe-cli --locked

With GPU support from crates.io:

cargo install transcribe-cli --locked --features cuda

From a local checkout:

cargo install --path . --locked

With GPU support:

cargo install --path . --locked --features cuda

First run

Models and runtime files are stored in a sandbox next to the installed binary:

<binary_dir>/transcribe_sandbox/

This sandbox is used for:

  • downloaded models
  • ONNX Runtime shared libraries

If needed, you can override only the model storage directory:

transcribe-cli --models-dir /path/to/models file.wav

Basic usage

Russian transcription:

transcribe-cli /path/to/file.wav
transcribe-cli --language ru /path/to/file.mp3

English transcription:

transcribe-cli --language en /path/to/file.wav

Video input:

transcribe-cli movie.mp4

Remote media URL:

transcribe-cli https://example.com/audio.mp3

Use GPU:

transcribe-cli --gpu /path/to/file.wav
transcribe-cli --gpu --gpu-device 0 /path/to/file.wav

Override compute type:

transcribe-cli --compute-type int8 /path/to/file.wav
transcribe-cli --compute-type float32 --gpu /path/to/file.wav

Stream transcript while decoding:

transcribe-cli --stream /path/to/file.wav
transcribe-cli --language en --stream song.mp3

Live URL transcription:

transcribe-cli --live http://127.0.0.1:8765/stream
transcribe-cli --live --gpu --language ru http://127.0.0.1:8765/stream

Live URL transcription with VAD:

transcribe-cli --live --vad http://127.0.0.1:8765/stream

Start REST server:

transcribe-cli --server 8787

Main arguments

  • MEDIA Path or URL to an audio or video source.
  • --language <ru|en> Selects the backend.
  • --gpu Enables NVIDIA GPU execution.
  • --gpu-device <N> Selects CUDA device index.
  • --compute-type <auto|int8|float32> Overrides compute mode.
  • --stream Prints transcript chunk-by-chunk during normal file/URL transcription.
  • --live Treats MEDIA as a live http/https stream.
  • --vad Enables lightweight speech segmentation for --live.
  • --server <PORT> Starts the REST API instead of a one-shot CLI transcription.
  • --models-dir <DIR> Overrides the model directory.
  • --remove-model Removes the selected model and its related artifacts.
  • --remove-all Removes the whole model directory.

Cleanup

Remove the current language-selected model:

transcribe-cli --language ru --remove-model
transcribe-cli --language en --remove-model

Remove all downloaded models:

transcribe-cli --remove-all

Notes

  • ru and en are the only supported language values right now.
  • --live supports direct http/https streams, not playlist protocols such as HLS/DASH.
  • --vad is currently valid only together with --live.
  • Audio/video decoding is done inside the Rust pipeline through symphonia.
  • The REST API is documented separately in REST_API.md.