Crate wavekat_vad

Expand description

WaveKat VAD — Unified voice activity detection with multiple backends.

This crate provides a common VoiceActivityDetector trait with implementations for different VAD backends, enabling experimentation and benchmarking across technologies.

§Backends

Backend	Feature	Sample Rates	Frame Size	Output
WebRTC	`webrtc` (default)	8/16/32/48 kHz	10, 20, or 30ms	Binary (0.0 or 1.0)
Silero	`silero`	8/16 kHz	32ms	Continuous (0.0–1.0)
TEN-VAD	`ten-vad`	16 kHz only	16ms	Continuous (0.0–1.0)

§Quick start

Add the crate with the backend you need:

[dependencies]
wavekat-vad = "0.1"                                  # WebRTC only (default)
wavekat-vad = { version = "0.1", features = ["silero"] }  # Silero
wavekat-vad = { version = "0.1", features = ["ten-vad"] } # TEN-VAD

Then create a detector and process audio frames:

use wavekat_vad::VoiceActivityDetector;
use wavekat_vad::backends::webrtc::{WebRtcVad, WebRtcVadMode};

let mut vad = WebRtcVad::new(16000, WebRtcVadMode::Quality).unwrap();
let samples = vec![0i16; 480]; // 30ms at 16kHz
let probability = vad.process(&samples, 16000).unwrap();
println!("Speech probability: {probability}");

§Writing backend-generic code

All backends implement VoiceActivityDetector, so you can write code that works with any backend:

use wavekat_vad::VoiceActivityDetector;

fn detect_speech(vad: &mut dyn VoiceActivityDetector, audio: &[i16], sample_rate: u32) {
    let caps = vad.capabilities();
    for frame in audio.chunks_exact(caps.frame_size) {
        let prob = vad.process(frame, sample_rate).unwrap();
        if prob > 0.5 {
            println!("Speech detected!");
        }
    }
}

§Handling arbitrary chunk sizes

Real-world audio often arrives in chunks that don’t match the backend’s required frame size. Use FrameAdapter to buffer and split automatically:

use wavekat_vad::FrameAdapter;
use wavekat_vad::backends::webrtc::{WebRtcVad, WebRtcVadMode};

let vad = WebRtcVad::new(16000, WebRtcVadMode::Quality).unwrap();
let mut adapter = FrameAdapter::new(Box::new(vad));

let chunk = vec![0i16; 1000]; // arbitrary size
let results = adapter.process_all(&chunk, 16000).unwrap();
for prob in &results {
    println!("{prob:.3}");
}

§Audio preprocessing

Optional preprocessing stages can improve accuracy with noisy input. See the preprocessing module for details.

use wavekat_vad::preprocessing::{Preprocessor, PreprocessorConfig};

let config = PreprocessorConfig::raw_mic(); // 80Hz HP + normalization
let mut preprocessor = Preprocessor::new(&config, 16000);
let raw: Vec<i16> = vec![0; 512];
let cleaned = preprocessor.process(&raw);
// feed `cleaned` to your VAD

§Feature flags

Feature	Default	Description
`webrtc`	Yes	WebRTC VAD backend
`silero`	No	Silero VAD backend (ONNX model downloaded at build time)
`ten-vad`	No	TEN-VAD backend (ONNX model downloaded at build time)
`denoise`	No	RNNoise-based noise suppression in `preprocessing`
`serde`	No	`Serialize`/`Deserialize` for config types

§ONNX model downloads

The Silero and TEN-VAD backends download their ONNX models automatically at build time. For offline or CI builds, set environment variables to point to local model files:

SILERO_MODEL_PATH=/path/to/silero_vad.onnx cargo build --features silero
TEN_VAD_MODEL_PATH=/path/to/ten-vad.onnx cargo build --features ten-vad

§Error handling

All backends return Result<f32, VadError>. Check a backend’s requirements with VoiceActivityDetector::capabilities() before processing:

VadError::InvalidSampleRate — unsupported sample rate
VadError::InvalidFrameSize — wrong number of samples
VadError::BackendError — backend-specific error (e.g. ONNX failure)

§Examples

Runnable examples are in the examples/ directory:

detect_speech — Detect speech in a WAV file using any backend
ten_vad_file — Process a WAV file with TEN-VAD directly

cargo run --example detect_speech -- audio.wav
cargo run --example detect_speech --features silero -- -b silero audio.wav
cargo run --example ten_vad_file --features ten-vad -- audio.wav

§TEN-VAD model license

The TEN-VAD ONNX model is licensed under Apache-2.0 with a non-compete clause by the TEN-framework / Agora. It restricts deployment that competes with Agora’s offerings. Review the TEN-VAD license before using in production.

Re-exports§

pub use adapter::FrameAdapter;
pub use error::VadError;

Modules§

adapter: Frame adapter for matching audio frames to VAD backend requirements.
backends: VAD backend implementations.
error
frame
preprocessing: Audio preprocessing pipeline for improving VAD accuracy.

Structs§

VadCapabilities: Describes the audio requirements of a VAD backend.

Traits§

VoiceActivityDetector: Common interface for voice activity detection backends.