parakeet-rs
Fast speech recognition with NVIDIA's Parakeet models via ONNX Runtime. Note: CoreML doesn't stable with this model - stick w/ CPU (or other GPU EP like CUDA). But its incredible fast in my Mac M3 16gb' CPU compared to Whisper metal! :-)
Models
CTC (English-only): Fast & accurate
use Parakeet;
let mut parakeet = from_pretrained?;
let result = parakeet.transcribe?;
println!;
// Token-level timestamps
for token in result.tokens
TDT (Multilingual): 25 languages with auto-detection
use ParakeetTDT;
let mut parakeet = from_pretrained?;
let result = parakeet.transcribe?;
println!;
// Token-level timestamps
for token in result.tokens
Setup
CTC: Download from HuggingFace: model.onnx, model.onnx_data, tokenizer.json
TDT: Download from HuggingFace: encoder-model.onnx, encoder-model.onnx.data, decoder_joint-model.onnx, vocab.txt
Quantized versions available (int8). All files must be in the same directory.
GPU support (auto-falls back to CPU if fails):
= { = "0.1", = ["cuda"] } # or tensorrt, webgpu, directml, rocm
use ;
let config = new.with_execution_provider;
let mut parakeet = from_pretrained?;
Features
- CTC: English with punctuation & capitalization
- TDT: 25 languages (bg, hr, cs, da, nl, en, et, fi, fr, de, el, hu, it, lv, lt, mt, pl, pt, ro, sk, sl, es, sv, ru, uk)
- Token-level timestamps
- Speaker diarization: see
examples/pyannote.rs
Notes
- Audio: 16kHz mono WAV (16-bit PCM or 32-bit float)
License
Code: MIT OR Apache-2.0
FYI: The Parakeet ONNX models (downloaded separately from HuggingFace) are licensed under CC-BY-4.0 by NVIDIA. This library does not distribute the models.