Expand description
WaveKat VAD — Unified voice activity detection with multiple backends.
This crate provides a common VoiceActivityDetector trait with
implementations for different VAD backends, enabling experimentation
and benchmarking across technologies.
§Backends
| Backend | Feature | Sample Rates | Frame Size | Output |
|---|---|---|---|---|
| WebRTC | webrtc (default) | 8/16/32/48 kHz | 10, 20, or 30ms | Binary (0.0 or 1.0) |
| Silero | silero | 8/16 kHz | 32ms | Continuous (0.0–1.0) |
| TEN-VAD | ten-vad | 16 kHz only | 16ms | Continuous (0.0–1.0) |
| FireRedVAD | firered | 16 kHz only | 10ms | Continuous (0.0–1.0) |
§Quick start
Add the crate with the backend you need:
[dependencies]
wavekat-vad = "0.1" # WebRTC only (default)
wavekat-vad = { version = "0.1", features = ["silero"] } # Silero
wavekat-vad = { version = "0.1", features = ["ten-vad"] } # TEN-VAD
wavekat-vad = { version = "0.1", features = ["firered"] } # FireRedVADThen create a detector and process audio frames:
use wavekat_vad::VoiceActivityDetector;
use wavekat_vad::backends::webrtc::{WebRtcVad, WebRtcVadMode};
let mut vad = WebRtcVad::new(16000, WebRtcVadMode::Quality).unwrap();
let samples = vec![0i16; 480]; // 30ms at 16kHz
let probability = vad.process(&samples, 16000).unwrap();
println!("Speech probability: {probability}");§Writing backend-generic code
All backends implement VoiceActivityDetector, so you can write code
that works with any backend:
use wavekat_vad::VoiceActivityDetector;
fn detect_speech(vad: &mut dyn VoiceActivityDetector, audio: &[i16], sample_rate: u32) {
let caps = vad.capabilities();
for frame in audio.chunks_exact(caps.frame_size) {
let prob = vad.process(frame, sample_rate).unwrap();
if prob > 0.5 {
println!("Speech detected!");
}
}
}§Handling arbitrary chunk sizes
Real-world audio often arrives in chunks that don’t match the backend’s
required frame size. Use FrameAdapter to buffer and split automatically:
use wavekat_vad::FrameAdapter;
use wavekat_vad::backends::webrtc::{WebRtcVad, WebRtcVadMode};
let vad = WebRtcVad::new(16000, WebRtcVadMode::Quality).unwrap();
let mut adapter = FrameAdapter::new(Box::new(vad));
let chunk = vec![0i16; 1000]; // arbitrary size
let results = adapter.process_all(&chunk, 16000).unwrap();
for prob in &results {
println!("{prob:.3}");
}§Audio preprocessing
Optional preprocessing stages can improve accuracy with noisy input.
See the preprocessing module for details.
use wavekat_vad::preprocessing::{Preprocessor, PreprocessorConfig};
let config = PreprocessorConfig::raw_mic(); // 80Hz HP + normalization
let mut preprocessor = Preprocessor::new(&config, 16000);
let raw: Vec<i16> = vec![0; 512];
let cleaned = preprocessor.process(&raw);
// feed `cleaned` to your VAD§Feature flags
| Feature | Default | Description |
|---|---|---|
webrtc | Yes | WebRTC VAD backend |
silero | No | Silero VAD backend (ONNX model downloaded at build time) |
ten-vad | No | TEN-VAD backend (ONNX model downloaded at build time) |
firered | No | FireRedVAD backend (ONNX model + CMVN downloaded at build time) |
denoise | No | RNNoise-based noise suppression in preprocessing |
serde | No | Serialize/Deserialize for config types |
§ONNX model downloads
The Silero, TEN-VAD, and FireRedVAD backends download their ONNX models automatically at build time. The Silero backend is pinned to v6.2.1 by default.
For offline or CI builds, set environment variables to point to local model files:
SILERO_MODEL_PATH=/path/to/silero_vad.onnx cargo build --features silero
TEN_VAD_MODEL_PATH=/path/to/ten-vad.onnx cargo build --features ten-vad
FIRERED_MODEL_PATH=/path/to/model.onnx FIRERED_CMVN_PATH=/path/to/cmvn.ark cargo build --features fireredTo use a different Silero model version, override the download URL:
SILERO_MODEL_URL=https://github.com/snakers4/silero-vad/raw/v6.0/src/silero_vad/data/silero_vad.onnx cargo build --features silero§Error handling
All backends return Result<f32, VadError>. Check a backend’s
requirements with VoiceActivityDetector::capabilities() before processing:
VadError::InvalidSampleRate— unsupported sample rateVadError::InvalidFrameSize— wrong number of samplesVadError::BackendError— backend-specific error (e.g. ONNX failure)
§Examples
Runnable examples are in the
examples/
directory:
detect_speech— Detect speech in a WAV file using any backendten_vad_file— Process a WAV file with TEN-VAD directly
cargo run --example detect_speech -- audio.wav
cargo run --example detect_speech --features silero -- -b silero audio.wav
cargo run --example ten_vad_file --features ten-vad -- audio.wav§TEN-VAD model license
The TEN-VAD ONNX model is licensed under Apache-2.0 with a non-compete clause by the TEN-framework / Agora. It restricts deployment that competes with Agora’s offerings. Review the TEN-VAD license before using in production.
Re-exports§
pub use adapter::FrameAdapter;pub use error::VadError;
Modules§
- adapter
- Frame adapter for matching audio frames to VAD backend requirements.
- backends
- VAD backend implementations.
- error
- frame
- preprocessing
- Audio preprocessing pipeline for improving VAD accuracy.
Structs§
- Process
Timings - Accumulated processing time breakdown by named pipeline stage.
- VadCapabilities
- Describes the audio requirements of a VAD backend.
Traits§
- Voice
Activity Detector - Common interface for voice activity detection backends.