transcribe-rs 0.2.4

A simple library to help you transcribe audio
Documentation

transcribe-rs

A Rust library providing unified transcription capabilities using multiple speech recognition engines. Currently supports Whisper and Parakeet (NeMo) models for accurate speech-to-text transcription.

Features

  • Multiple Engines: Support for both Whisper and Parakeet transcription engines
  • Flexible Model Loading: Load models with custom parameters (quantization, etc.)
  • Timestamped Results: Get detailed timing information for transcribed segments
  • Audio Processing: Built-in WAV file processing with proper format validation
  • Unified API: Common trait-based interface for all transcription engines

Model Format Requirements

  • Whisper: Expects a single GGML format file (e.g., whisper-medium-q4_1.bin)
  • Parakeet: Expects a directory containing the model files (e.g., parakeet-v0.3/)

Quick Start

[dependencies]
transcribe-rs = { version = "0.2", features = ["whisper"] }
use std::path::PathBuf;
use transcribe_rs::{engines::whisper::WhisperEngine, TranscriptionEngine};

let mut engine = WhisperEngine::new();
engine.load_model(&PathBuf::from("models/whisper-medium-q4_1.bin"))?;

let result = engine.transcribe_file(&PathBuf::from("audio.wav"), None)?;
println!("Transcription: {}", result.text);

if let Some(segments) = result.segments {
    for segment in segments {
        println!(
            "[{:.2}s - {:.2}s]: {}",
            segment.start, segment.end, segment.text
        );
    }
}
# Ok::<(), Box<dyn std::error::Error>>(())

Audio Requirements

Input audio files must be:

  • WAV format
  • 16 kHz sample rate
  • 16-bit samples
  • Mono (single channel)