Expand description
§transcribe-rs
A Rust library providing unified transcription capabilities using multiple speech recognition engines.
§Features
- ONNX Models: SenseVoice, GigaAM, Parakeet, Moonshine (requires
onnxfeature) - Whisper: OpenAI Whisper via GGML (requires
whisper-cppfeature) - Whisperfile: Mozilla Whisperfile server (requires
whisperfilefeature) - Remote: OpenAI API (requires
openaifeature) - Timestamped Results: Detailed timing information for transcribed segments
- Unified API:
SpeechModeltrait for all local engines - Hardware Acceleration: GPU support for ORT engines (
ort-cuda,ort-rocm,ort-directml,ort-coreml,ort-webgpu) and whisper.cpp (Metal/Vulkan) via theaccelmodule
§Backend Categories
This crate provides two categories of transcription backend:
- Local models implement
SpeechModeland run inference in-process or via a local binary. This includes all ONNX models, Whisper (via whisper.cpp), and Whisperfile. - Remote services implement [
RemoteTranscriptionEngine] (requiresopenaifeature) and make async network calls to external APIs. This includes OpenAI.
These traits are intentionally separate because the execution model differs: local models are synchronous and take audio samples directly, while remote services are async and may only accept file uploads.
§Quick Start
[dependencies]
transcribe-rs = { version = "0.3", features = ["onnx"] }use std::path::PathBuf;
use transcribe_rs::onnx::sense_voice::{SenseVoiceModel, SenseVoiceParams};
use transcribe_rs::onnx::Quantization;
use transcribe_rs::SpeechModel;
let mut model = SenseVoiceModel::load(
&PathBuf::from("models/sense-voice"),
&Quantization::Int8,
)?;
let result = model.transcribe(&samples, &transcribe_rs::TranscribeOptions::default())?;
println!("Transcription: {}", result.text);§Audio Requirements
Input audio files must be:
- WAV format
- 16 kHz sample rate
- 16-bit samples
- Mono (single channel)
§Migrating from 0.2.x to 0.3.0
Version 0.3.0 is a breaking release. If you need the old API, pin to version = "=0.2.9".
SpeechModel::transcribe signature changed:
// Before (0.2.x):
model.transcribe(&samples, Some("en"))?;
model.transcribe_file(&path, None)?;
// After (0.3.0):
use transcribe_rs::TranscribeOptions;
model.transcribe(&samples, &TranscribeOptions { language: Some("en".into()), ..Default::default() })?;
model.transcribe_file(&path, &TranscribeOptions::default())?;SpeechModel now requires Send, enabling Box<dyn SpeechModel + Send> for
use across threads.
TranscribeOptions includes a translate field (default false). Engines that
support translation (Whisper, Whisperfile) will translate to English when set to true.
Whisper capabilities are now dynamic. WhisperEngine::capabilities() returns the
actual language support of the loaded model (English-only vs multilingual) rather than
always reporting all 99 languages.
Re-exports§
pub use accel::get_ort_accelerator;pub use accel::get_whisper_accelerator;pub use accel::get_whisper_gpu_device;pub use accel::set_ort_accelerator;pub use accel::set_whisper_accelerator;pub use accel::set_whisper_gpu_device;pub use accel::OrtAccelerator;pub use accel::WhisperAccelerator;pub use accel::GPU_DEVICE_AUTO;pub use error::TranscribeError;
Modules§
- accel
- Per-engine accelerator preferences.
- audio
- Audio processing utilities for transcription.
- error
- transcriber
- Chunked transcription strategies.
- vad
- Voice Activity Detection (VAD).
Structs§
- Model
Capabilities - Describes the capabilities of a speech model.
- Transcribe
Options - Options for transcription.
- Transcription
Result - The result of a transcription operation.
- Transcription
Segment - A single transcribed segment with timing information.
Traits§
- Speech
Model - Unified interface for speech-to-text models.