whisper-cpp-plus
Pinned to whisper.cpp v1.8.3 (fork:
rmorse/whisper.cpp, branch:stream-pcm)
Safe Rust bindings for whisper.cpp with real-time PCM streaming and VAD support.
Highlights
- Real-time PCM streaming — feed raw audio chunks, get transcription as you go
- VAD integration — Silero-based voice activity detection for intelligent chunking
- Full whisper.cpp API — batch transcription, timestamps, language detection
- GPU acceleration — CUDA, Metal, OpenBLAS support
PCM Streaming (VAD-driven)
Process raw PCM from any Read source (file, stdin, socket, microphone) with automatic VAD segmentation. Port of stream-pcm.cpp.
use ;
let ctx = new?;
let params = new.language;
let config = WhisperStreamPcmConfig ;
// Any Read source — here a file, but could be stdin, socket, etc.
let source = open?;
let reader = new;
let mut stream = new?;
stream.run?;
File Transcription
For pre-recorded audio files — load a WAV (16kHz mono), transcribe in one shot:
use ;
let ctx = new?;
// Load WAV file as 16kHz mono f32 samples (using hound crate)
let mut reader = open?;
let audio: = reader.
.map
.collect;
// Simple — just get the text
let text = ctx.transcribe?;
println!;
// With parameters — get timestamped segments
let params = new
.language
.no_timestamps;
let result = ctx.transcribe_with_full_params?;
for seg in &result.segments
Sliding Window Streaming
Feed audio chunks and process with configurable step/overlap. Port of stream.cpp.
use ;
let ctx = new?;
let params = default.language;
let config = WhisperStreamConfig ;
let mut stream = with_config?;
// Feed audio chunks as they arrive
stream.feed_audio;
// Process when ready
while let Some = stream.process_step?
Features
| Feature | Description |
|---|---|
cuda |
NVIDIA GPU acceleration via CUDA |
metal |
Apple Metal acceleration (macOS) |
openblas |
OpenBLAS acceleration (Linux) |
async |
Async transcription API via tokio |
Enable in Cargo.toml:
[]
= { = "0.1.4", = ["cuda"] }
Modules
- Transcription —
WhisperContext,WhisperState,FullParams,TranscriptionParamsbuilder - Streaming —
WhisperStreamfor chunked real-time transcription - StreamPCM —
WhisperStreamPcmfor raw PCM input with VAD-driven processing - VAD —
WhisperVadProcessorfor Silero-based voice activity detection - Enhanced — Temperature fallback + enhanced VAD aggregation for improved quality
- Quantization —
WhisperQuantizefor model compression (feature =quantization)
Examples
License
MIT