parakeet-rs
Fast speech recognition with NVIDIA's Parakeet models via ONNX Runtime. Note: CoreML doesn't stable with this model - stick w/ CPU (or other GPU EP like CUDA). But its incredible fast in my Mac M3 16gb' CPU compared to Whisper metal! :-)
Models
CTC (English-only): Fast & accurate
use Parakeet;
let mut parakeet = from_pretrained?;
let result = parakeet.transcribe_file?;
println!;
// Or transcribe in-memory audio
// let result = parakeet.transcribe_samples(audio, 16000, 1)?;
// Token-level timestamps
for token in result.tokens
TDT (Multilingual): 25 languages with auto-detection
use ParakeetTDT;
let mut parakeet = from_pretrained?;
let result = parakeet.transcribe_file?;
println!;
// Or transcribe in-memory audio
// let result = parakeet.transcribe_samples(audio, 16000, 1)?;
// Token-level timestamps
for token in result.tokens
EOU (Streaming): Real-time ASR with end-of-utterance detection
use ParakeetEOU;
let mut parakeet = from_pretrained?;
// Prepare your audio (Vec<f32>, 16kHz mono, normalized)
let audio: = /* your audio samples */;
// Process in 160ms chunks for streaming
const CHUNK_SIZE: usize = 2560; // 160ms at 16kHz
for chunk in audio.chunks
Sortformer v2 & v2.1 (Speaker Diarization): Streaming 4-speaker diarization
= { = "0.2", = ["sortformer"] }
use ;
let mut sortformer = with_config?;
let segments = sortformer.diarize?;
for seg in segments
See examples/diarization.rs for combining with TDT transcription.
Setup
CTC: Download from HuggingFace: model.onnx, model.onnx_data, tokenizer.json
TDT: Download from HuggingFace: encoder-model.onnx, encoder-model.onnx.data, decoder_joint-model.onnx, vocab.txt
EOU: Download from HuggingFace: encoder.onnx, decoder_joint.onnx, tokenizer.json
Diarization (Sortformer v2 & v2.1): Download from HuggingFace: diar_streaming_sortformer_4spk-v2.onnx or v2.1.onnx.
Quantized versions available (int8). All files must be in the same directory.
GPU support (auto-falls back to CPU if fails):
= { = "0.1", = ["cuda"] } # or tensorrt, webgpu, directml, rocm
use ;
let config = new.with_execution_provider;
let mut parakeet = from_pretrained?;
Features
- CTC: English with punctuation & capitalization
- TDT: Multilingual (auto lang detection)
- EOU: Streaming ASR with end-of-utterance detection
- Sortformer v2 & v2.1: Streaming speaker diarization (up to 4 speakers) NOTE: you can also download v2.1 model same way.
- Token-level timestamps (CTC, TDT)
Notes
- Audio: 16kHz mono WAV (16-bit PCM or 32-bit float)
License
Code: MIT OR Apache-2.0
FYI: The Parakeet ONNX models (downloaded separately from HuggingFace) are licensed under CC-BY-4.0 by NVIDIA. This library does not distribute the models.