transcribe-rs
A Rust library for audio transcription supporting multiple engines including Whisper, Parakeet, and Moonshine.
This library was extracted from the Handy project to help other developers integrate transcription capabilities into their applications. We hope to support additional ASR models in the future and may expand to include features like microphone input and real-time transcription.
Features
- Multiple Transcription Engines: Support for Whisper, Whisperfile, Parakeet, and Moonshine models
- Cross-platform: Works on macOS, Windows, and Linux with optimized backends
- Hardware Acceleration: Metal on macOS, Vulkan on Windows/Linux
- Flexible API: Common interface for different transcription engines
- Multi-language Support: Moonshine supports English, Arabic, Chinese, Japanese, Korean, Ukrainian, Vietnamese, and Spanish
- Opt-in Dependencies: Only compile and link the engines you need via Cargo features
Installation
Add transcribe-rs to your Cargo.toml with the features you need:
[]
# Include only the engines you want to use
= { = "0.1.5", = ["parakeet", "moonshine"] }
# Or enable all engines
= { = "0.1.5", = ["all"] }
Available Features
| Feature | Description | Dependencies |
|---|---|---|
whisper |
OpenAI Whisper (local, GGML format) | whisper-rs with Metal/Vulkan |
parakeet |
NVIDIA Parakeet (ONNX) | ort, ndarray |
moonshine |
UsefulSensors Moonshine (ONNX) | ort, ndarray, tokenizers |
whisperfile |
Mozilla whisperfile server wrapper | reqwest |
openai |
OpenAI API (remote) | async-openai, tokio |
all |
All engines enabled | All of the above |
Note: By default, no features are enabled. You must explicitly choose which engines to include.
Parakeet Performance
Using the int8 quantized Parakeet model, performance benchmarks:
- 30x real time on MBP M4 Max
- 20x real time on Zen 3 (5700X)
- 5x real time on Skylake (i5-6500)
- 5x real time on Jetson Nano CPU
Required Model Files
Parakeet Model Directory Structure:
models/parakeet-v0.3/
├── encoder-model.onnx # Encoder model (FP32)
├── encoder-model.int8.onnx # Encoder model (For quantized)
├── decoder_joint-model.onnx # Decoder/joint model (FP32)
├── decoder_joint-model.int8.onnx # Decoder/joint model (For quantized)
├── nemo128.onnx # Audio preprocessor
├── vocab.txt # Vocabulary file
Whisper Model:
- Single GGML file (e.g.,
whisper-medium-q4_1.bin)
Whisperfile:
- Requires whisperfile binary and a Whisper GGML model
- Whisperfile manages a local server that handles transcription requests
Moonshine Model Directory Structure:
models/moonshine-tiny/
├── encoder_model.onnx # Audio encoder
├── decoder_model_merged.onnx # Decoder with KV cache support
└── tokenizer.json # BPE tokenizer vocabulary
Moonshine Model Variants:
| Variant | Language | Model Folder |
|---|---|---|
| Tiny | English | moonshine-tiny |
| TinyAr | Arabic | moonshine-tiny-ar |
| TinyZh | Chinese | moonshine-tiny-zh |
| TinyJa | Japanese | moonshine-tiny-ja |
| TinyKo | Korean | moonshine-tiny-ko |
| TinyUk | Ukrainian | moonshine-tiny-uk |
| TinyVi | Vietnamese | moonshine-tiny-vi |
| Base | English | moonshine-base |
| BaseEs | Spanish | moonshine-base-es |
Audio Requirements:
- Format: WAV
- Sample Rate: 16 kHz
- Channels: Mono (1 channel)
- Bit Depth: 16-bit
- Encoding: PCM
Model Downloads
- Parakeet:
- Pre-packaged int8 quantized model: https://blob.handy.computer/parakeet-v3-int8.tar.gz
- Original model files: https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx/tree/main
- Whisper: https://huggingface.co/ggerganov/whisper.cpp/tree/main
- Whisperfile Binary: https://github.com/mozilla-ai/llamafile/releases/download/0.9.3/whisperfile-0.9.3
- Moonshine: https://huggingface.co/UsefulSensors/moonshine/tree/main/onnx/merged
Usage
Parakeet Engine
use ;
use PathBuf;
let mut engine = new;
engine.load_model?;
let result = engine.transcribe_file?;
println!;
Moonshine Engine
use ;
use PathBuf;
let mut engine = new;
engine.load_model_with_params?;
let result = engine.transcribe_file?;
println!;
Whisperfile Engine
use ;
use PathBuf;
let mut engine = new;
engine.load_model_with_params?;
let result = engine.transcribe_file?;
println!;
Running the Examples
Setup
-
Create the models directory:
-
Download models for the engine you want to use:
For Parakeet:
For Whisper:
For Whisperfile:
First, download the whisperfile binary:
Then download a Whisper GGML model:
For Moonshine:
Download the required model files from Huggingface.
For the Tiny English model:
For other variants (TinyAr, TinyZh, Base, etc.), replace
tinyin the URLs with the appropriate variant folder name (e.g.,tiny-ar,tiny-zh,base,base-es).
Running the Examples
Each engine has its own example file. You must specify the required feature when running:
# Run Parakeet example (recommended for performance)
# Run Whisper example
# Run Whisperfile example
# Run Moonshine example
# Run OpenAI API example
Each example will:
- Load the specified model
- Transcribe a sample audio file
- Display timing information and transcription results
- Show real-time speedup factor
Running Tests
Running Individual Engine Tests
Tests are feature-gated and require you to specify which engine to test:
# Test a specific engine
# Test multiple engines
# Test all engines
Local Development Shortcuts
The .cargo/config.toml file provides convenient aliases for local development:
# Run all tests with all features enabled
# Check compilation with all features
# Build with all features
Test Environment Setup
For Whisperfile tests:
The whisperfile tests require:
- The whisperfile binary at
models/whisperfile-0.9.3(or setWHISPERFILE_BINenv var) - A Whisper GGML model at
models/ggml-small.bin(or setWHISPERFILE_MODELenv var)
# Download whisperfile binary
# Download a model
# Run tests
For Moonshine tests:
Download the Moonshine base model:
# Run tests
For Parakeet tests:
Download the int8 quantized Parakeet model:
# Run tests
For Whisper tests:
Whisper tests will skip if models are not available in the expected locations.
Acknowledgments
- Big thanks to istupakov for the excellent ONNX implementation of Parakeet
- Thanks to NVIDIA for releasing the Parakeet model
- Thanks to the whisper.cpp project for the Whisper implementation
- Big thanks to jart for llamafile. Thanks to Mozilla AI for maintaining the Whisperfile implementation
- Thanks to UsefulSensors for the Moonshine models and ONNX exports