Skip to main content

Crate parakeet_rs

Crate parakeet_rs 

Source
Expand description

§parakeet-rs

Rust bindings for NVIDIA’s Parakeet speech recognition model using ONNX Runtime.

Parakeet is a state-of-the-art automatic speech recognition (ASR) model developed by NVIDIA, based on the FastConformer-TDT architecture with 600 million parameters.

§Features

  • Easy-to-use API for speech-to-text transcription
  • Support for ONNX format models
  • 16kHz mono audio input
  • Punctuation and capitalization included in output
  • Fast inference using ONNX Runtime

§Quick Start

use parakeet_rs::{Parakeet, Transcriber, TimestampMode};

// Load the model
let mut parakeet = Parakeet::from_pretrained(".")?;

// Transcribe audio samples (see examples/raw.rs for audio loading)
let result = parakeet.transcribe_samples(audio, sample_rate, channels, Some(TimestampMode::Words))?;
println!("Transcription: {}", result.text);

§Model Requirements

Your model directory should contain:

  • model.onnx - The ONNX model file
  • model.onnx_data - External model weights
  • config.json - Model configuration
  • preprocessor_config.json - Audio preprocessing configuration
  • tokenizer.json - Tokenizer vocabulary
  • tokenizer_config.json - Tokenizer configuration

§Audio Requirements

  • Format: WAV
  • Sample Rate: 16kHz
  • Channels: Mono (stereo will be converted automatically)
  • Bit Depth: 16-bit PCM or 32-bit float

Structs§

ExecutionConfig
ModelConfigJson
Nemotron
Nemotron streaming ASR model (0.6B parameters). We dont apply mel normalization unlike others…
NemotronEncoderCache
Encoder cache state for Nemotron streaming inference.
NemotronModel
Nemotron ONNX wrapper. we handle encoder and decoder_joint sessions separately.
NemotronModelConfig
cfg for Nemotron model dims.
Parakeet
ParakeetDecoder
ParakeetEOU
Parakeet RealTime EOU model for streaming ASR with end-of-utterance detection. Uses cache-aware streaming with audio buffering for pre-encode context.
ParakeetEOUModel
ParakeetModel
ParakeetTDT
Parakeet TDT model for multilingual ASR
PreprocessorConfig
SentencePieceVocab
Minimal SentencePiece vocabulary loader. Parses the protobuf .model file to extract token strings. Note that, our vocab.rs cannot parse protobuf format. I haven’t test it with digit spacing yet, at least for this initial impl.
TimedToken
TranscriptionResult

Enums§

Error
ExecutionProvider
TimestampMode
Timestamp output mode for transcription results

Traits§

Transcriber
Trait for common transcription functionality

Type Aliases§

Result