transcribe-rs

A Rust library providing unified transcription capabilities using multiple speech recognition engines. Currently supports Whisper and Parakeet (NeMo) models for accurate speech-to-text transcription.

Features

Multiple Engines: Support for both Whisper and Parakeet transcription engines
Flexible Model Loading: Load models with custom parameters (quantization, etc.)
Timestamped Results: Get detailed timing information for transcribed segments
Audio Processing: Built-in WAV file processing with proper format validation
Unified API: Common trait-based interface for all transcription engines

Model Format Requirements

Whisper: Expects a single GGML format file (e.g., whisper-medium-q4_1.bin)
Parakeet: Expects a directory containing the model files (e.g., parakeet-v0.3/)

Quick Start

[dependencies]
transcribe-rs = { version = "0.2", features = ["whisper"] }

use std::path::PathBuf;
use transcribe_rs::{engines::whisper::WhisperEngine, TranscriptionEngine};

let mut engine = WhisperEngine::new();
engine.load_model(&PathBuf::from("models/whisper-medium-q4_1.bin"))?;

let result = engine.transcribe_file(&PathBuf::from("audio.wav"), None)?;
println!("Transcription: {}", result.text);

if let Some(segments) = result.segments {
    for segment in segments {
        println!(
            "[{:.2}s - {:.2}s]: {}",
            segment.start, segment.end, segment.text
        );
    }
}
# Ok::<(), Box<dyn std::error::Error>>(())

Audio Requirements

Input audio files must be:

WAV format
16 kHz sample rate
16-bit samples
Mono (single channel)

transcribe-rs 0.2.4

transcribe-rs

Features

Model Format Requirements

Quick Start

Audio Requirements