Crate parakeet_rs

Expand description

§parakeet-rs

Rust bindings for NVIDIA’s Parakeet speech recognition model using ONNX Runtime.

Parakeet is a state-of-the-art automatic speech recognition (ASR) model developed by NVIDIA, based on the FastConformer-TDT architecture with 600 million parameters.

§Features

Easy-to-use API for speech-to-text transcription
Support for ONNX format models
16kHz mono audio input
Punctuation and capitalization included in output
Fast inference using ONNX Runtime

§Quick Start

use parakeet_rs::{Parakeet, Transcriber, TimestampMode};

// Load the model
let mut parakeet = Parakeet::from_pretrained(".")?;

// Transcribe audio samples (see examples/raw.rs for audio loading)
let result = parakeet.transcribe_samples(audio, sample_rate, channels, Some(TimestampMode::Words))?;
println!("Transcription: {}", result.text);

§Model Requirements

Your model directory should contain:

model.onnx - The ONNX model file
model.onnx_data - External model weights
config.json - Model configuration
preprocessor_config.json - Audio preprocessing configuration
tokenizer.json - Tokenizer vocabulary
tokenizer_config.json - Tokenizer configuration

§Audio Requirements

Format: WAV
Sample Rate: 16kHz
Channels: Mono (stereo will be converted automatically)
Bit Depth: 16-bit PCM or 32-bit float

Structs§

ExecutionConfig
ModelConfigJson
Nemotron: Nemotron streaming ASR model (0.6B parameters). We dont apply mel normalization unlike others…
NemotronEncoderCache: Encoder cache state for Nemotron streaming inference.
NemotronHandle: Shared handle to a loaded Nemotron model. ONNX session is only loaded once and reference counted.
NemotronModel: Nemotron ONNX wrapper. we handle encoder and decoder_joint sessions separately.
NemotronModelConfig: cfg for Nemotron model dims.
Parakeet
ParakeetDecoder
ParakeetEOU: Parakeet RealTime EOU model for streaming ASR with end-of-utterance detection. Uses cache-aware streaming with audio buffering for pre-encode context.
ParakeetEOUHandle: Shared handle to a loaded ParakeetEOU model. The ONNX session is loaded once and reference-counted.
ParakeetEOUModel
ParakeetModel
ParakeetTDT: Parakeet TDT model for multilingual ASR
ParakeetUnified
ParakeetUnifiedHandle: Shared handle to a loaded ParakeetUnified model. The ONNX session is loaded once and reference-counted.
ParakeetUnifiedModel
PreprocessorConfig
SentencePieceVocab: Minimal SentencePiece vocabulary loader. Parses the protobuf .model file to extract token strings. Note that, our vocab.rs cannot parse protobuf format. I haven’t test it with digit spacing yet, at least for this initial impl.
TimedToken
TranscriptionResult
UnifiedModelConfig
UnifiedStreamingConfig

Enums§

Error
ExecutionProvider
TimestampMode: Timestamp output mode for transcription results

Traits§

Transcriber: Trait for common transcription functionality

Type Aliases§

Result

Crate parakeet_rs

Crate parakeet_rs Copy item path

§parakeet-rs

§Features

§Quick Start

§Model Requirements

§Audio Requirements

Structs§

Enums§

Traits§

Type Aliases§

Crate parakeet_rs