transcribe-rs
A Rust library providing unified transcription capabilities using multiple speech recognition engines. Currently supports Whisper and Parakeet (NeMo) models for accurate speech-to-text transcription.
Features
- Multiple Engines: Support for both Whisper and Parakeet transcription engines
- Flexible Model Loading: Load models with custom parameters (quantization, etc.)
- Timestamped Results: Get detailed timing information for transcribed segments
- Audio Processing: Built-in WAV file processing with proper format validation
- Unified API: Common trait-based interface for all transcription engines
Model Format Requirements
- Whisper: Expects a single GGML format file (e.g.,
whisper-medium-q4_1.bin) - Parakeet: Expects a directory containing the model files (e.g.,
parakeet-v0.3/)
Quick Start
[]
= { = "0.2", = ["whisper"] }
use std::path::PathBuf;
use transcribe_rs::{engines::whisper::WhisperEngine, TranscriptionEngine};
let mut engine = WhisperEngine::new();
engine.load_model(&PathBuf::from("models/whisper-medium-q4_1.bin"))?;
let result = engine.transcribe_file(&PathBuf::from("audio.wav"), None)?;
println!("Transcription: {}", result.text);
if let Some(segments) = result.segments {
for segment in segments {
println!(
"[{:.2}s - {:.2}s]: {}",
segment.start, segment.end, segment.text
);
}
}
# Ok::<(), Box<dyn std::error::Error>>(())
Audio Requirements
Input audio files must be:
- WAV format
- 16 kHz sample rate
- 16-bit samples
- Mono (single channel)