Expand description
§parakeet-rs
Rust bindings for NVIDIA’s Parakeet speech recognition model using ONNX Runtime.
Parakeet is a state-of-the-art automatic speech recognition (ASR) model developed by NVIDIA, based on the FastConformer-TDT architecture with 600 million parameters.
§Features
- Easy-to-use API for speech-to-text transcription
- Support for ONNX format models
- 16kHz mono audio input
- Punctuation and capitalization included in output
- Fast inference using ONNX Runtime
§Quick Start
ⓘ
use parakeet_rs::{Parakeet, Transcriber, TimestampMode};
// Load the model
let mut parakeet = Parakeet::from_pretrained(".")?;
// Transcribe audio samples (see examples/raw.rs for audio loading)
let result = parakeet.transcribe_samples(audio, sample_rate, channels, Some(TimestampMode::Words))?;
println!("Transcription: {}", result.text);§Model Requirements
Your model directory should contain:
model.onnx- The ONNX model filemodel.onnx_data- External model weightsconfig.json- Model configurationpreprocessor_config.json- Audio preprocessing configurationtokenizer.json- Tokenizer vocabularytokenizer_config.json- Tokenizer configuration
§Audio Requirements
- Format: WAV
- Sample Rate: 16kHz
- Channels: Mono (stereo will be converted automatically)
- Bit Depth: 16-bit PCM or 32-bit float
Structs§
- Execution
Config - Model
Config Json - Nemotron
- Nemotron streaming ASR model (0.6B parameters). We dont apply mel normalization unlike others…
- Nemotron
Encoder Cache - Encoder cache state for Nemotron streaming inference.
- Nemotron
Model - Nemotron ONNX wrapper. we handle encoder and decoder_joint sessions separately.
- Nemotron
Model Config - cfg for Nemotron model dims.
- Parakeet
- Parakeet
Decoder - ParakeetEOU
- Parakeet RealTime EOU model for streaming ASR with end-of-utterance detection. Uses cache-aware streaming with audio buffering for pre-encode context.
- ParakeetEOU
Model - Parakeet
Model - ParakeetTDT
- Parakeet TDT model for multilingual ASR
- Preprocessor
Config - Sentence
Piece Vocab - Minimal SentencePiece vocabulary loader. Parses the protobuf .model file to extract token strings. Note that, our vocab.rs cannot parse protobuf format. I haven’t test it with digit spacing yet, at least for this initial impl.
- Timed
Token - Transcription
Result
Enums§
- Error
- Execution
Provider - Timestamp
Mode - Timestamp output mode for transcription results
Traits§
- Transcriber
- Trait for common transcription functionality