Expand description
WaveKat VAD — Unified voice activity detection with multiple backends.
This crate provides a common VoiceActivityDetector trait with
implementations for different VAD backends, enabling experimentation
and benchmarking across technologies.
§Backends
| Backend | Feature | Sample Rates | Frame Size | Output |
|---|---|---|---|---|
| WebRTC | webrtc (default) | 8/16/32/48 kHz | 10, 20, or 30ms | Binary (0.0 or 1.0) |
| Silero | silero | 8/16 kHz | 32ms | Continuous (0.0–1.0) |
| TEN-VAD | ten-vad | 16 kHz only | 16ms | Continuous (0.0–1.0) |
§Quick start
Add the crate with the backend you need:
[dependencies]
wavekat-vad = "0.1" # WebRTC only (default)
wavekat-vad = { version = "0.1", features = ["silero"] } # Silero
wavekat-vad = { version = "0.1", features = ["ten-vad"] } # TEN-VADThen create a detector and process audio frames:
use wavekat_vad::VoiceActivityDetector;
use wavekat_vad::backends::webrtc::{WebRtcVad, WebRtcVadMode};
let mut vad = WebRtcVad::new(16000, WebRtcVadMode::Quality).unwrap();
let samples = vec![0i16; 480]; // 30ms at 16kHz
let probability = vad.process(&samples, 16000).unwrap();
println!("Speech probability: {probability}");§Writing backend-generic code
All backends implement VoiceActivityDetector, so you can write code
that works with any backend:
use wavekat_vad::VoiceActivityDetector;
fn detect_speech(vad: &mut dyn VoiceActivityDetector, audio: &[i16], sample_rate: u32) {
let caps = vad.capabilities();
for frame in audio.chunks_exact(caps.frame_size) {
let prob = vad.process(frame, sample_rate).unwrap();
if prob > 0.5 {
println!("Speech detected!");
}
}
}§Handling arbitrary chunk sizes
Real-world audio often arrives in chunks that don’t match the backend’s
required frame size. Use FrameAdapter to buffer and split automatically:
use wavekat_vad::FrameAdapter;
use wavekat_vad::backends::webrtc::{WebRtcVad, WebRtcVadMode};
let vad = WebRtcVad::new(16000, WebRtcVadMode::Quality).unwrap();
let mut adapter = FrameAdapter::new(Box::new(vad));
let chunk = vec![0i16; 1000]; // arbitrary size
let results = adapter.process_all(&chunk, 16000).unwrap();
for prob in &results {
println!("{prob:.3}");
}§Audio preprocessing
Optional preprocessing stages can improve accuracy with noisy input.
See the preprocessing module for details.
use wavekat_vad::preprocessing::{Preprocessor, PreprocessorConfig};
let config = PreprocessorConfig::raw_mic(); // 80Hz HP + normalization
let mut preprocessor = Preprocessor::new(&config, 16000);
let raw: Vec<i16> = vec![0; 512];
let cleaned = preprocessor.process(&raw);
// feed `cleaned` to your VAD§Feature flags
| Feature | Default | Description |
|---|---|---|
webrtc | Yes | WebRTC VAD backend |
silero | No | Silero VAD backend (ONNX model downloaded at build time) |
ten-vad | No | TEN-VAD backend (ONNX model downloaded at build time) |
denoise | No | RNNoise-based noise suppression in preprocessing |
serde | No | Serialize/Deserialize for config types |
§ONNX model downloads
The Silero and TEN-VAD backends download their ONNX models automatically at build time. For offline or CI builds, set environment variables to point to local model files:
SILERO_MODEL_PATH=/path/to/silero_vad.onnx cargo build --features silero
TEN_VAD_MODEL_PATH=/path/to/ten-vad.onnx cargo build --features ten-vad§Error handling
All backends return Result<f32, VadError>. Check a backend’s
requirements with VoiceActivityDetector::capabilities() before processing:
VadError::InvalidSampleRate— unsupported sample rateVadError::InvalidFrameSize— wrong number of samplesVadError::BackendError— backend-specific error (e.g. ONNX failure)
§Examples
Runnable examples are in the
examples/
directory:
detect_speech— Detect speech in a WAV file using any backendten_vad_file— Process a WAV file with TEN-VAD directly
cargo run --example detect_speech -- audio.wav
cargo run --example detect_speech --features silero -- -b silero audio.wav
cargo run --example ten_vad_file --features ten-vad -- audio.wav§TEN-VAD model license
The TEN-VAD ONNX model is licensed under Apache-2.0 with a non-compete clause by the TEN-framework / Agora. It restricts deployment that competes with Agora’s offerings. Review the TEN-VAD license before using in production.
Re-exports§
pub use adapter::FrameAdapter;pub use error::VadError;
Modules§
- adapter
- Frame adapter for matching audio frames to VAD backend requirements.
- backends
- VAD backend implementations.
- error
- frame
- preprocessing
- Audio preprocessing pipeline for improving VAD accuracy.
Structs§
- VadCapabilities
- Describes the audio requirements of a VAD backend.
Traits§
- Voice
Activity Detector - Common interface for voice activity detection backends.