whisper-cpp-plus
Pinned to whisper.cpp v1.8.3 (fork:
rmorse/whisper.cpp, branch:stream-pcm)
Safe Rust bindings for whisper.cpp with real-time PCM streaming and VAD support.
Highlights
- Real-time PCM streaming — feed raw audio chunks, get transcription as you go
- VAD integration — Silero-based voice activity detection for intelligent chunking
- Full whisper.cpp API — batch transcription, timestamps, language detection
- GPU acceleration — CUDA, Metal, OpenBLAS support
Real-time Streaming
Two streaming APIs for different use cases:
WhisperStream — Sliding Window (port of stream.cpp)
Feed audio chunks and process with configurable step/overlap:
use ;
let ctx = new?;
let params = default.language;
let config = WhisperStreamConfig ;
let mut stream = with_config?;
// Feed audio chunks as they arrive
stream.feed_audio;
// Process when ready
while let Some = stream.process_step?
WhisperStreamPcm — VAD-driven (port of stream-pcm.cpp)
Process raw PCM from any Read source with automatic VAD segmentation:
use ;
use File;
let ctx = new?;
let params = default.language;
let config = WhisperStreamPcmConfig ;
// Create PCM reader from any Read source (file, stdin, socket, etc.)
let file = open?;
let reader = new;
let mut stream = new?;
// Process until EOF
stream.run?;
Batch Transcription
For pre-recorded audio files:
use ;
let ctx = new?;
let audio: = load_audio; // 16kHz mono f32
// Simple
let text = ctx.transcribe?;
// With parameters
let params = builder
.language
.temperature
.enable_timestamps
.build;
let result = ctx.transcribe_with_params?;
for seg in &result.segments
Features
| Feature | Description |
|---|---|
cuda |
NVIDIA GPU acceleration via CUDA |
metal |
Apple Metal acceleration (macOS) |
openblas |
OpenBLAS acceleration (Linux) |
async |
Async transcription API via tokio |
Enable in Cargo.toml:
[]
= { = "0.1.3", = ["cuda"] }
Modules
- Transcription —
WhisperContext,WhisperState,FullParams,TranscriptionParamsbuilder - Streaming —
WhisperStreamfor chunked real-time transcription - StreamPCM —
WhisperStreamPcmfor raw PCM input with VAD-driven processing - VAD —
WhisperVadProcessorfor Silero-based voice activity detection - Enhanced — Temperature fallback + enhanced VAD aggregation for improved quality
- Quantization —
WhisperQuantizefor model compression (feature =quantization)
Examples
License
MIT