Skip to main content

Crate wavekat_vad

Crate wavekat_vad 

Source
Expand description

WaveKat VAD — Unified voice activity detection with multiple backends.

This crate provides a common VoiceActivityDetector trait with implementations for different VAD backends, enabling experimentation and benchmarking across technologies.

§Backends

BackendFeatureSample RatesFrame SizeOutput
WebRTCwebrtc (default)8/16/32/48 kHz10, 20, or 30msBinary (0.0 or 1.0)
Silerosilero8/16 kHz32msContinuous (0.0–1.0)
TEN-VADten-vad16 kHz only16msContinuous (0.0–1.0)
FireRedVADfirered16 kHz only10msContinuous (0.0–1.0)

§Quick start

Add the crate with the backend you need:

[dependencies]
wavekat-vad = "0.1"                                  # WebRTC only (default)
wavekat-vad = { version = "0.1", features = ["silero"] }  # Silero
wavekat-vad = { version = "0.1", features = ["ten-vad"] } # TEN-VAD
wavekat-vad = { version = "0.1", features = ["firered"] } # FireRedVAD

Then create a detector and process audio frames:

use wavekat_vad::VoiceActivityDetector;
use wavekat_vad::backends::webrtc::{WebRtcVad, WebRtcVadMode};

let mut vad = WebRtcVad::new(16000, WebRtcVadMode::Quality).unwrap();
let samples = vec![0i16; 480]; // 30ms at 16kHz
let probability = vad.process(&samples, 16000).unwrap();
println!("Speech probability: {probability}");

§Writing backend-generic code

All backends implement VoiceActivityDetector, so you can write code that works with any backend:

use wavekat_vad::VoiceActivityDetector;

fn detect_speech(vad: &mut dyn VoiceActivityDetector, audio: &[i16], sample_rate: u32) {
    let caps = vad.capabilities();
    for frame in audio.chunks_exact(caps.frame_size) {
        let prob = vad.process(frame, sample_rate).unwrap();
        if prob > 0.5 {
            println!("Speech detected!");
        }
    }
}

§Handling arbitrary chunk sizes

Real-world audio often arrives in chunks that don’t match the backend’s required frame size. Use FrameAdapter to buffer and split automatically:

use wavekat_vad::FrameAdapter;
use wavekat_vad::backends::webrtc::{WebRtcVad, WebRtcVadMode};

let vad = WebRtcVad::new(16000, WebRtcVadMode::Quality).unwrap();
let mut adapter = FrameAdapter::new(Box::new(vad));

let chunk = vec![0i16; 1000]; // arbitrary size
let results = adapter.process_all(&chunk, 16000).unwrap();
for prob in &results {
    println!("{prob:.3}");
}

§Audio preprocessing

Optional preprocessing stages can improve accuracy with noisy input. See the preprocessing module for details.

use wavekat_vad::preprocessing::{Preprocessor, PreprocessorConfig};

let config = PreprocessorConfig::raw_mic(); // 80Hz HP + normalization
let mut preprocessor = Preprocessor::new(&config, 16000);
let raw: Vec<i16> = vec![0; 512];
let cleaned = preprocessor.process(&raw);
// feed `cleaned` to your VAD

§Feature flags

FeatureDefaultDescription
webrtcYesWebRTC VAD backend
sileroNoSilero VAD backend (ONNX model downloaded at build time)
ten-vadNoTEN-VAD backend (ONNX model downloaded at build time)
fireredNoFireRedVAD backend (ONNX model + CMVN downloaded at build time)
denoiseNoRNNoise-based noise suppression in preprocessing
serdeNoSerialize/Deserialize for config types

§ONNX model downloads

The Silero, TEN-VAD, and FireRedVAD backends download their ONNX models automatically at build time. The Silero backend is pinned to v6.2.1 by default.

For offline or CI builds, set environment variables to point to local model files:

SILERO_MODEL_PATH=/path/to/silero_vad.onnx cargo build --features silero
TEN_VAD_MODEL_PATH=/path/to/ten-vad.onnx cargo build --features ten-vad
FIRERED_MODEL_PATH=/path/to/model.onnx FIRERED_CMVN_PATH=/path/to/cmvn.ark cargo build --features firered

To use a different Silero model version, override the download URL:

SILERO_MODEL_URL=https://github.com/snakers4/silero-vad/raw/v6.0/src/silero_vad/data/silero_vad.onnx cargo build --features silero

§Error handling

All backends return Result<f32, VadError>. Check a backend’s requirements with VoiceActivityDetector::capabilities() before processing:

§Examples

Runnable examples are in the examples/ directory:

cargo run --example detect_speech -- audio.wav
cargo run --example detect_speech --features silero -- -b silero audio.wav
cargo run --example ten_vad_file --features ten-vad -- audio.wav

§TEN-VAD model license

The TEN-VAD ONNX model is licensed under Apache-2.0 with a non-compete clause by the TEN-framework / Agora. It restricts deployment that competes with Agora’s offerings. Review the TEN-VAD license before using in production.

Re-exports§

pub use adapter::FrameAdapter;
pub use error::VadError;

Modules§

adapter
Frame adapter for matching audio frames to VAD backend requirements.
backends
VAD backend implementations.
error
frame
preprocessing
Audio preprocessing pipeline for improving VAD accuracy.

Structs§

ProcessTimings
Accumulated processing time breakdown by named pipeline stage.
VadCapabilities
Describes the audio requirements of a VAD backend.

Traits§

VoiceActivityDetector
Common interface for voice activity detection backends.