transcribe-cli 0.0.6

Native Rust CLI transcription pipeline with GigaAM v3 ONNX
# transcribe-cli

`transcribe-cli` is a native Rust transcription tool built around ONNX Runtime.

Crate:

- crates.io: `transcribe-cli`
- repository: `https://github.com/qwertyu1opz/transcribe-cli`

Current backends:

- `ru` -> `GigaAM v3 e2e CTC`
- `en` -> `Parakeet TDT v2`

It supports:

- local audio and video files
- `http/https` media URLs
- chunked transcript output with `--stream`
- live URL transcription with `--live`
- optional lightweight VAD for live mode with `--vad`
- REST server mode with `--server`
- CPU and NVIDIA GPU execution

REST documentation:

- [REST_API.md]REST_API.md

## Requirements

- Rust `1.85+`
- Linux is the primary target
- no external `ffmpeg` dependency is required

For GPU runs:

- install with `--features cuda`
- NVIDIA driver must be available
- the project downloads the required ONNX Runtime CUDA libraries into its sandbox automatically

## Install

From crates.io:

```bash
cargo install transcribe-cli --locked
```

With GPU support from crates.io:

```bash
cargo install transcribe-cli --locked --features cuda
```

From a local checkout:

```bash
cargo install --path . --locked
```

With GPU support:

```bash
cargo install --path . --locked --features cuda
```

## First run

Models and runtime files are stored in a sandbox next to the installed binary:

```text
<binary_dir>/transcribe_sandbox/
```

This sandbox is used for:

- downloaded models
- ONNX Runtime shared libraries

If needed, you can override only the model storage directory:

```bash
transcribe-cli --models-dir /path/to/models file.wav
```

## Basic usage

Russian transcription:

```bash
transcribe-cli /path/to/file.wav
transcribe-cli --language ru /path/to/file.mp3
```

English transcription:

```bash
transcribe-cli --language en /path/to/file.wav
```

Video input:

```bash
transcribe-cli movie.mp4
```

Remote media URL:

```bash
transcribe-cli https://example.com/audio.mp3
```

Use GPU:

```bash
transcribe-cli --gpu /path/to/file.wav
transcribe-cli --gpu --gpu-device 0 /path/to/file.wav
```

Override compute type:

```bash
transcribe-cli --compute-type int8 /path/to/file.wav
transcribe-cli --compute-type float32 --gpu /path/to/file.wav
```

Stream transcript while decoding:

```bash
transcribe-cli --stream /path/to/file.wav
transcribe-cli --language en --stream song.mp3
```

Live URL transcription:

```bash
transcribe-cli --live http://127.0.0.1:8765/stream
transcribe-cli --live --gpu --language ru http://127.0.0.1:8765/stream
```

Live URL transcription with VAD:

```bash
transcribe-cli --live --vad http://127.0.0.1:8765/stream
```

Start REST server:

```bash
transcribe-cli --server 8787
```

## Main arguments

- `MEDIA`
  Path or URL to an audio or video source.
- `--language <ru|en>`
  Selects the backend.
- `--gpu`
  Enables NVIDIA GPU execution.
- `--gpu-device <N>`
  Selects CUDA device index.
- `--compute-type <auto|int8|float32>`
  Overrides compute mode.
- `--stream`
  Prints transcript chunk-by-chunk during normal file/URL transcription.
- `--live`
  Treats `MEDIA` as a live `http/https` stream.
- `--vad`
  Enables lightweight speech segmentation for `--live`.
- `--server <PORT>`
  Starts the REST API instead of a one-shot CLI transcription.
- `--models-dir <DIR>`
  Overrides the model directory.
- `--remove-model`
  Removes the selected model and its related artifacts.
- `--remove-all`
  Removes the whole model directory.

## Cleanup

Remove the current language-selected model:

```bash
transcribe-cli --language ru --remove-model
transcribe-cli --language en --remove-model
```

Remove all downloaded models:

```bash
transcribe-cli --remove-all
```

## Notes

- `ru` and `en` are the only supported language values right now.
- `--live` supports direct `http/https` streams, not playlist protocols such as HLS/DASH.
- `--vad` is currently valid only together with `--live`.
- Audio/video decoding is done inside the Rust pipeline through `symphonia`.
- The REST API is documented separately in [REST_API.md]REST_API.md.