parakeet-rs
Fast English speech recognition with NVIDIA's Parakeet model via ONNX Runtime. Note: CoreML doesn't work with this model - stick w/ CPU (or other GPU EP like CUDA). But its incredible fast in my Mac M3 16gb' CPU compared to Whisper metal! :-)
use Parakeet;
let mut parakeet = from_pretrained?;
let result = parakeet.transcribe?;
println!;
// Token-level timestamps
for token in result.tokens
Setup
Download from HuggingFace: model.onnx, model.onnx_data, tokenizer.json
Quantized versions also available (fp16, int8, q4). All 3 files must be in the same directory.
GPU support:
= { = "0.x", = ["cuda"] }
use ;
let config = new.with_execution_provider;
let mut parakeet = from_pretrained_with_config?;
Features
- English transcription with punctuation & capitalization
- Token-level timestamps from CTC output
- Batch processing:
transcribe_batch(&["a.wav", "b.wav"])etc - See
examples/pyannote.rsfor speaker diarization + transcription.
Notes
- Audio: 16kHz mono WAV (16-bit PCM or 32-bit float)
License
Code: MIT OR Apache-2.0
FYI: The Parakeet ONNX models (downloaded separately from HuggingFace) are licensed under CC-BY-4.0 by NVIDIA. This library does not distribute the models.