transcribe-rs
Multi-engine speech-to-text library for Rust. Supports Parakeet, Canary, Moonshine, SenseVoice, GigaAM, Whisper, Whisperfile, and OpenAI.
Breaking Changes in 0.3.0
Version 0.3.0 changes the SpeechModel trait. If you need the old API, pin to version = "=0.2.9".
transcribe()andtranscribe_file()now take&TranscribeOptionsinstead ofOption<&str>for languageSpeechModelrequiresSend, enablingBox<dyn SpeechModel + Send>across threadsTranscribeOptionsincludes atranslatefield for Whisper/Whisperfile translation supportWhisperEngine::capabilities()now returns actual model language support (English-only vs multilingual) instead of always reporting 99 languages
Note: 0.3.0 is a large migration. We believe correctness is preserved for all engines, but expect potential issues as this stabilizes. Please report any problems on GitHub.
Installation
[]
= { = "0.3", = ["onnx"] }
No features are enabled by default. Pick the engines you need:
| Feature | Engines |
|---|---|
onnx |
Parakeet, Canary, Moonshine, SenseVoice, GigaAM (via ONNX Runtime) |
whisper-cpp |
Whisper (local, GGML via whisper.cpp with Metal/Vulkan) |
whisperfile |
Whisperfile (local server wrapper) |
openai |
OpenAI API (remote, async) |
all |
Everything above |
GPU accelerator features for ORT engines:
| Feature | Backend |
|---|---|
ort-cuda |
NVIDIA CUDA |
ort-rocm |
AMD ROCm |
ort-directml |
Microsoft DirectML (Windows) |
Quick Start
use ;
use Quantization;
use PathBuf;
let mut model = load?;
let samples = read_wav_samples?;
let result = model.transcribe_with?;
println!;
All local engines implement the SpeechModel trait. Remote engines (OpenAI) implement RemoteTranscriptionEngine separately because they are async and file-based.
Hardware Acceleration
By default, engines use CPU. To enable GPU acceleration, enable the appropriate feature and set the accelerator preference before loading any models:
use ;
// Use CUDA for all ORT engines (SenseVoice, GigaAM, Parakeet, Moonshine)
set_ort_accelerator;
// Or auto-detect the best available GPU
set_ort_accelerator;
For whisper.cpp, GPU backend (Metal, Vulkan) is selected at compile time. You can control whether GPU is used at runtime:
use ;
set_whisper_accelerator; // force CPU
DirectML note: DirectML requires special ORT session settings (parallel_execution(false), memory_pattern(false)) that would hurt performance on other backends. Because of this, Auto mode does not include DirectML — you must explicitly select it with OrtAccelerator::DirectMl.
Query which ORT accelerators are compiled in with OrtAccelerator::available().
Usage by Engine
Canary
use ;
use Quantization;
use PathBuf;
let mut model = load?;
let samples = read_wav_samples?;
let result = model.transcribe_with?;
Canary supports translation via target_language:
let result = model.transcribe_with?;
Model variant (Flash vs V2) is auto-detected from vocabulary size. Flash models support en/de/es/fr; V2 supports 25 languages.
Features:
- PnC (punctuation and capitalization) — enabled by default. When on, the model adds proper punctuation and capitalization. Set
use_pnc: falsefor raw output. - ITN (inverse text normalization) — enabled by default. Converts spoken numbers to written form (e.g. "one hundred twenty three" becomes "123"). Set
use_itn: falseto disable. Only supported on V2 models; silently ignored on Flash. - Translation — set
target_languageto translate between supported languages.
SenseVoice
use ;
use Quantization;
use PathBuf;
let mut model = load?;
let samples = read_wav_samples?;
let result = model.transcribe_with?;
Moonshine
use ;
use Quantization;
use SpeechModel;
use PathBuf;
let mut model = load?;
let result = model.transcribe_file?;
Streaming variant:
use StreamingModel;
use Quantization;
use SpeechModel;
use PathBuf;
let mut model = load?;
let result = model.transcribe_file?;
GigaAM
use GigaAMModel;
use Quantization;
use SpeechModel;
use PathBuf;
let mut model = load?;
let result = model.transcribe_file?;
Whisper (whisper.cpp)
use ;
use PathBuf;
let mut engine = load?;
let samples = read_wav_samples?;
let result = engine.transcribe_with?;
Whisperfile
use ;
use PathBuf;
let mut engine = load_with_params?;
let samples = read_wav_samples?;
let result = engine.transcribe_with?;
// Server shuts down automatically when engine is dropped.
OpenAI (Remote)
use ;
use ;
use PathBuf;
async
Models
All audio input must be 16 kHz, mono, 16-bit PCM WAV.
Model Downloads
| Engine | Download |
|---|---|
| Parakeet (int8) | blob.handy.computer / HuggingFace |
| Canary 180M Flash | HuggingFace |
| Canary 1B Flash | HuggingFace |
| Canary 1B v2 | HuggingFace |
| SenseVoice (int8) | blob.handy.computer / sherpa-onnx |
| Moonshine | HuggingFace |
| GigaAM | HuggingFace |
| Whisper (GGML) | HuggingFace |
| Whisperfile binary | GitHub |
Directory Layouts
Parakeet (directory):
models/parakeet-tdt-0.6b-v3-int8/
├── encoder-model.int8.onnx
├── decoder_joint-model.int8.onnx
├── nemo128.onnx
└── vocab.txt
Canary (directory):
models/canary-1b-v2/
├── encoder-model.int8.onnx
├── decoder-model.int8.onnx
├── nemo128.onnx
└── vocab.txt
SenseVoice (directory):
models/sense-voice/
├── model.int8.onnx
└── tokens.txt
Moonshine (directory):
models/moonshine-base/
├── encoder_model.onnx
├── decoder_model_merged.onnx
└── tokenizer.json
Moonshine Streaming (directory):
models/moonshine-streaming/moonshine-tiny-streaming-en/
├── encoder.onnx
├── decoder.onnx
├── streaming_config.json
└── tokenizer.json
GigaAM (directory):
models/giga-am-v3/
├── model.onnx (or model.int8.onnx)
└── vocab.txt
Whisper: single file (e.g. whisper-medium-q4_1.bin).
Moonshine Variants
| Variant | Language |
|---|---|
| Tiny | English |
| TinyAr | Arabic |
| TinyZh | Chinese |
| TinyJa | Japanese |
| TinyKo | Korean |
| TinyUk | Ukrainian |
| TinyVi | Vietnamese |
| Base | English |
| BaseEs | Spanish |
Examples and Tests
Each engine has an example in examples/. Run with the appropriate feature flag:
Tests are also feature-gated. Models must be present locally; tests skip gracefully if not found.
Whisperfile tests look for the binary at models/whisperfile-0.9.3 (override with WHISPERFILE_BIN) and model at models/ggml-small.bin (override with WHISPERFILE_MODEL). GigaAM tests require samples/russian.wav.
Development aliases from .cargo/config.toml:
Performance
Parakeet int8 benchmarks:
| Platform | Speed |
|---|---|
| MBP M4 Max | ~30x real-time |
| Zen 3 (5700X) | ~20x real-time |
| Skylake (i5-6500) | ~5x real-time |
| Jetson Nano CPU | ~5x real-time |
Acknowledgments
- istupakov for the ONNX Parakeet, Canary, and GigaAM exports
- NVIDIA for Parakeet and Canary
- whisper.cpp
- jart / Mozilla AI for llamafile and Whisperfile
- UsefulSensors for Moonshine
- FunASR / sherpa-onnx for SenseVoice
- SberDevices for GigaAM