Voice Activity Detection library for Rust with multiple backend support.
Quick Start
use VoiceActivityDetector;
use ;
let mut vad = new.unwrap;
let samples: = vec!; // 10ms at 16kHz
let probability = vad.process.unwrap;
Backends
| Backend | Feature | Sample Rates | Frame Size | Output |
|---|---|---|---|---|
| WebRTC | webrtc (default) |
8/16/32/48 kHz | 10, 20, or 30ms | Binary (0.0 or 1.0) |
| Silero | silero |
8/16 kHz | 32ms (256 or 512 samples) | Continuous (0.0–1.0) |
| TEN-VAD | ten-vad |
16 kHz only | 16ms (256 samples) | Continuous (0.0–1.0) |
| FireRedVAD | firered |
16 kHz only | 10ms (160 samples) | Continuous (0.0–1.0) |
[]
= "0.1" # WebRTC only (default)
= { = "0.1", = ["silero"] }
= { = "0.1", = ["ten-vad"] }
= { = "0.1", = ["firered"] }
= { = "0.1", = ["webrtc", "silero", "ten-vad", "firered"] } # all backends
Benchmarks
Performance measured against the TEN-VAD testset — 30 audio files from LibriSpeech, GigaSpeech, and DNS Challenge with manual speech/non-speech annotations. Threshold: 0.5.
v0.1.13
| Backend | Precision | Recall | F1 Score | Frame Size | Avg Inference | RTF |
|---|---|---|---|---|---|---|
| WebRTC | 0.821 | 0.983 | 0.895 | 480 (30 ms) | 2.7 µs | 0.0001 |
| Silero | 0.938 | 0.938 | 0.938 | 512 (32 ms) | 117.4 µs | 0.0037 |
| TEN-VAD | 0.942 | 0.915 | 0.928 | 256 (16 ms) | 61.5 µs | 0.0038 |
| FireRedVAD | 0.950 | 0.879 | 0.913 | 160 (10 ms) | 540.1 µs | 0.0540 |
Accuracy metrics are deterministic; inference times are approximate and vary by hardware. Measured with
--releaseon GitHub Actionsubuntu-latestrunners. Run locally:make accuracyormake bench
WebRTC
Google's WebRTC VAD. Fast and lightweight, returns binary speech/silence detection. Supports four aggressiveness modes.
use VoiceActivityDetector;
use ;
// Default 30ms frame duration
let mut vad = new.unwrap;
// Or specify frame duration (10, 20, or 30ms)
let mut vad = with_frame_duration.unwrap;
let samples = vec!; // 20ms at 16kHz
let result = vad.process.unwrap; // 0.0 or 1.0
Silero
Neural network (LSTM) via ONNX Runtime. Returns continuous probability, best overall F1 across benchmarks. Only supports 8kHz and 16kHz.
use VoiceActivityDetector;
use SileroVad;
let mut vad = new.unwrap;
let samples = vec!; // 32ms at 16kHz
let probability = vad.process.unwrap; // 0.0–1.0
// Or load a custom model
let vad = from_file.unwrap;
TEN-VAD
Agora's TEN-VAD with pure Rust preprocessing (no C dependency). Returns continuous probability, 16kHz only.
use VoiceActivityDetector;
use TenVad;
let mut vad = new.unwrap;
let samples = vec!; // 16ms at 16kHz
let probability = vad.process.unwrap; // 0.0–1.0
FireRedVAD
Xiaohongshu's FireRedVAD using a DFSMN architecture with pure Rust FBank preprocessing. Returns continuous probability, 16kHz only.
use VoiceActivityDetector;
use FireRedVad;
let mut vad = new.unwrap;
let samples = vec!; // 10ms at 16kHz
let probability = vad.process.unwrap; // 0.0–1.0
The VoiceActivityDetector Trait
All backends implement a common trait, so you can write code that is generic over backends:
use ;
FrameAdapter
Real-world audio arrives in arbitrary chunk sizes. FrameAdapter buffers incoming samples and feeds correctly-sized frames to the backend automatically.
use FrameAdapter;
use SileroVad;
let vad = new.unwrap;
let mut adapter = new;
// Feed arbitrary-sized chunks — adapter handles buffering
let chunk = vec!; // not a multiple of 512
// Get all complete frame results at once
let probabilities = adapter.process_all.unwrap;
// Or get just the latest result (convenient for real-time)
let latest = adapter.process_latest.unwrap;
// Or process one frame at a time
let result = adapter.process.unwrap; // Some(prob) or None
Preprocessing
Optional audio preprocessing to improve VAD accuracy. Available stages: high-pass filter, noise suppression, and amplitude normalization.
use ;
// Use a preset
let config = raw_mic; // 80Hz HP + normalize + denoise
// let config = PreprocessorConfig::telephony(); // 200Hz HP only
// Or configure manually
let config = PreprocessorConfig ;
let mut preprocessor = new;
let raw_audio: = vec!;
let cleaned = preprocessor.process;
// feed `cleaned` to your VAD
Feature Flags
| Feature | Default | Description |
|---|---|---|
webrtc |
Yes | WebRTC VAD backend |
silero |
No | Silero VAD backend (ONNX model downloaded at build time) |
ten-vad |
No | TEN-VAD backend (ONNX model downloaded at build time) |
firered |
No | FireRedVAD backend (ONNX model downloaded at build time) |
denoise |
No | RNNoise-based noise suppression in the preprocessing pipeline |
serde |
No | Serialize/Deserialize for config types |
ONNX Model Downloads
Silero, TEN-VAD, and FireRedVAD models are downloaded automatically at build time. The Silero backend is pinned to v6.2.1 by default.
For offline or CI builds, point to a local model file:
SILERO_MODEL_PATH=/path/to/silero_vad.onnx
TEN_VAD_MODEL_PATH=/path/to/ten-vad.onnx
FIRERED_MODEL_PATH=/path/to/fireredvad.onnx FIRERED_CMVN_PATH=/path/to/cmvn.ark
To use a different Silero model version, override the download URL:
SILERO_MODEL_URL=https://github.com/snakers4/silero-vad/raw/v6.0/src/silero_vad/data/silero_vad.onnx
Error Handling
All backends return Result<f32, VadError>. The error type covers:
VadError::InvalidSampleRate(u32)— unsupported sample rate for the backendVadError::InvalidFrameSize { got, expected }— wrong number of samplesVadError::BackendError(String)— backend-specific error (e.g., ONNX failure)
Use capabilities() to check a backend's requirements before processing.
vad-lab
Dev tool for live VAD experimentation. Captures audio server-side and streams results to a web UI.
Quick Start
Videos
| Video | Description |
|---|---|
| Adding FireRedVAD as the 4th backend Benchmarking Xiaohongshu's FireRedVAD against Silero, TEN VAD, and WebRTC across accuracy and latency. | |
| VAD Lab: Real-time multi-backend comparison Live demo of VAD Lab comparing WebRTC, Silero, and TEN VAD side by side with real-time waveform visualization. |
License
Apache-2.0
TEN-VAD model notice
The TEN-VAD ONNX model (used by the ten-vad feature) is licensed under Apache-2.0 with a non-compete clause by the TEN-framework / Agora. It restricts deployment that competes with Agora's offerings and limits deployment to "solely for your benefit and the benefit of your direct End Users." This is not standard open-source despite the Apache-2.0 label. Review the TEN-VAD license before using in production.
Acknowledgements
This project wraps and builds on several upstream projects:
- webrtc-vad — Rust bindings for Google's WebRTC VAD
- Silero VAD — neural network VAD by the Silero team
- TEN-VAD — lightweight VAD by TEN-framework / Agora
- FireRedVAD — DFSMN-based VAD by the FireRedTeam
- ort — ONNX Runtime bindings for Rust
- nnnoiseless — Rust port of RNNoise for noise suppression