Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Voice Activity Detection library for Rust with multiple backend support.
Quick Start
use VoiceActivityDetector;
use ;
let mut vad = new.unwrap;
let samples: = vec!; // 10ms at 16kHz
let probability = vad.process.unwrap;
Backends
| Backend | Feature | Sample Rates | Frame Size | Output |
|---|---|---|---|---|
| WebRTC | webrtc (default) |
8/16/32/48 kHz | 10, 20, or 30ms | Binary (0.0 or 1.0) |
| Silero | silero |
8/16 kHz | 32ms (256 or 512 samples) | Continuous (0.0–1.0) |
| TEN-VAD | ten-vad |
16 kHz only | 16ms (256 samples) | Continuous (0.0–1.0) |
| FireRedVAD | firered |
16 kHz only | 10ms (160 samples) | Continuous (0.0–1.0) |
[]
= "0.1" # WebRTC only (default)
= { = "0.1", = ["silero"] }
= { = "0.1", = ["ten-vad"] }
= { = "0.1", = ["firered"] }
= { = "0.1", = ["webrtc", "silero", "ten-vad", "firered"] } # all backends
Benchmarks
Performance measured against the TEN-VAD testset — 30 audio files from LibriSpeech, GigaSpeech, and DNS Challenge with manual speech/non-speech annotations. Threshold: 0.5.
v0.1.14
| Backend | Precision | Recall | F1 Score | Frame Size | Avg Inference | RTF |
|---|---|---|---|---|---|---|
| WebRTC | 0.821 | 0.983 | 0.895 | 480 (30 ms) | 2.7 µs | 0.0001 |
| Silero | 0.938 | 0.938 | 0.938 | 512 (32 ms) | 118.4 µs | 0.0037 |
| TEN-VAD | 0.942 | 0.915 | 0.928 | 256 (16 ms) | 62.0 µs | 0.0039 |
| FireRedVAD | 0.950 | 0.879 | 0.913 | 160 (10 ms) | 542.8 µs | 0.0543 |
Accuracy metrics are deterministic; inference times are approximate and vary by hardware. Measured with
--releaseon GitHub Actionsubuntu-latestrunners. Run locally:make accuracyormake bench
WebRTC
Google's WebRTC VAD. Fast and lightweight, returns binary speech/silence detection. Supports four aggressiveness modes.
use VoiceActivityDetector;
use ;
// Default 30ms frame duration
let mut vad = new.unwrap;
// Or specify frame duration (10, 20, or 30ms)
let mut vad = with_frame_duration.unwrap;
let samples = vec!; // 20ms at 16kHz
let result = vad.process.unwrap; // 0.0 or 1.0
Silero
Neural network (LSTM) via ONNX Runtime. Returns continuous probability, best overall F1 across benchmarks. Only supports 8kHz and 16kHz.
use VoiceActivityDetector;
use SileroVad;
let mut vad = new.unwrap;
let samples = vec!; // 32ms at 16kHz
let probability = vad.process.unwrap; // 0.0–1.0
// Or load a custom model
let vad = from_file.unwrap;
TEN-VAD
Agora's TEN-VAD with pure Rust preprocessing (no C dependency). Returns continuous probability, 16kHz only.
use VoiceActivityDetector;
use TenVad;
let mut vad = new.unwrap;
let samples = vec!; // 16ms at 16kHz
let probability = vad.process.unwrap; // 0.0–1.0
FireRedVAD
Xiaohongshu's FireRedVAD using a DFSMN architecture with pure Rust FBank preprocessing. Returns continuous probability, 16kHz only.
use VoiceActivityDetector;
use FireRedVad;
let mut vad = new.unwrap;
let samples = vec!; // 10ms at 16kHz
let probability = vad.process.unwrap; // 0.0–1.0
The VoiceActivityDetector Trait
All backends implement a common trait, so you can write code that is generic over backends:
use ;
FrameAdapter
Real-world audio arrives in arbitrary chunk sizes. FrameAdapter buffers incoming samples and feeds correctly-sized frames to the backend automatically.
use FrameAdapter;
use SileroVad;
let vad = new.unwrap;
let mut adapter = new;
// Feed arbitrary-sized chunks — adapter handles buffering
let chunk = vec!; // not a multiple of 512
// Get all complete frame results at once
let probabilities = adapter.process_all.unwrap;
// Or get just the latest result (convenient for real-time)
let latest = adapter.process_latest.unwrap;
// Or process one frame at a time
let result = adapter.process.unwrap; // Some(prob) or None
Preprocessing
Optional audio preprocessing to improve VAD accuracy. Available stages: high-pass filter, noise suppression, and amplitude normalization.
use ;
// Use a preset
let config = raw_mic; // 80Hz HP + normalize + denoise
// let config = PreprocessorConfig::telephony(); // 200Hz HP only
// Or configure manually
let config = PreprocessorConfig ;
let mut preprocessor = new;
let raw_audio: = vec!;
let cleaned = preprocessor.process;
// feed `cleaned` to your VAD
Feature Flags
| Feature | Default | Description |
|---|---|---|
webrtc |
Yes | WebRTC VAD backend |
silero |
No | Silero VAD backend (ONNX model downloaded at build time) |
ten-vad |
No | TEN-VAD backend (ONNX model downloaded at build time) |
firered |
No | FireRedVAD backend (ONNX model downloaded at build time) |
denoise |
No | RNNoise-based noise suppression in the preprocessing pipeline |
serde |
No | Serialize/Deserialize for config types |
ONNX Model Downloads
Silero, TEN-VAD, and FireRedVAD models are downloaded automatically at build time. The Silero backend is pinned to v6.2.1 by default.
For offline or CI builds, point to a local model file:
SILERO_MODEL_PATH=/path/to/silero_vad.onnx
TEN_VAD_MODEL_PATH=/path/to/ten-vad.onnx
FIRERED_MODEL_PATH=/path/to/fireredvad.onnx FIRERED_CMVN_PATH=/path/to/cmvn.ark
To use a different Silero model version, override the download URL:
SILERO_MODEL_URL=https://github.com/snakers4/silero-vad/raw/v6.0/src/silero_vad/data/silero_vad.onnx
Error Handling
All backends return Result<f32, VadError>. The error type covers:
VadError::InvalidSampleRate(u32)— unsupported sample rate for the backendVadError::InvalidFrameSize { got, expected }— wrong number of samplesVadError::BackendError(String)— backend-specific error (e.g., ONNX failure)
Use capabilities() to check a backend's requirements before processing.
vad-lab
vad-lab has moved to wavekat/wavekat-lab.
It is now a standalone repo so it can grow to cover other WaveKat libraries (turn detection, etc.) without being tied to this crate.
See wavekat/wavekat-lab for setup and usage.
Videos
| Video | Description |
|---|---|
| Adding FireRedVAD as the 4th backend Benchmarking Xiaohongshu's FireRedVAD against Silero, TEN VAD, and WebRTC across accuracy and latency. | |
| VAD Lab: Real-time multi-backend comparison Live demo of VAD Lab comparing WebRTC, Silero, and TEN VAD side by side with real-time waveform visualization. |
License
Apache-2.0
TEN-VAD model notice
The TEN-VAD ONNX model (used by the ten-vad feature) is licensed under Apache-2.0 with a non-compete clause by the TEN-framework / Agora. It restricts deployment that competes with Agora's offerings and limits deployment to "solely for your benefit and the benefit of your direct End Users." This is not standard open-source despite the Apache-2.0 label. Review the TEN-VAD license before using in production.
Acknowledgements
This project wraps and builds on several upstream projects:
- webrtc-vad — Rust bindings for Google's WebRTC VAD
- Silero VAD — neural network VAD by the Silero team
- TEN-VAD — lightweight VAD by TEN-framework / Agora
- FireRedVAD — DFSMN-based VAD by the FireRedTeam
- ort — ONNX Runtime bindings for Rust
- nnnoiseless — Rust port of RNNoise for noise suppression