Skip to main content

wavekat_vad/
lib.rs

1//! WaveKat VAD — Unified voice activity detection with multiple backends.
2//!
3//! This crate provides a common [`VoiceActivityDetector`] trait with
4//! implementations for different VAD backends, enabling experimentation
5//! and benchmarking across technologies.
6//!
7//! # Backends
8//!
9//! | Backend | Feature | Sample Rates | Frame Size | Output |
10//! |---------|---------|-------------|------------|--------|
11//! | [WebRTC](`backends::webrtc`) | `webrtc` (default) | 8/16/32/48 kHz | 10, 20, or 30ms | Binary (0.0 or 1.0) |
12//! | [Silero](`backends::silero`) | `silero` | 8/16 kHz | 32ms | Continuous (0.0–1.0) |
13//! | [TEN-VAD](`backends::ten_vad`) | `ten-vad` | 16 kHz only | 16ms | Continuous (0.0–1.0) |
14//!
15//! # Quick start
16//!
17//! Add the crate with the backend you need:
18//!
19//! ```toml
20//! [dependencies]
21//! wavekat-vad = "0.1"                                  # WebRTC only (default)
22//! wavekat-vad = { version = "0.1", features = ["silero"] }  # Silero
23//! wavekat-vad = { version = "0.1", features = ["ten-vad"] } # TEN-VAD
24//! ```
25//!
26//! Then create a detector and process audio frames:
27//!
28//! ```no_run
29//! # #[cfg(feature = "webrtc")]
30//! # {
31//! use wavekat_vad::VoiceActivityDetector;
32//! use wavekat_vad::backends::webrtc::{WebRtcVad, WebRtcVadMode};
33//!
34//! let mut vad = WebRtcVad::new(16000, WebRtcVadMode::Quality).unwrap();
35//! let samples = vec![0i16; 480]; // 30ms at 16kHz
36//! let probability = vad.process(&samples, 16000).unwrap();
37//! println!("Speech probability: {probability}");
38//! # }
39//! ```
40//!
41//! # Writing backend-generic code
42//!
43//! All backends implement [`VoiceActivityDetector`], so you can write code
44//! that works with any backend:
45//!
46//! ```no_run
47//! use wavekat_vad::VoiceActivityDetector;
48//!
49//! fn detect_speech(vad: &mut dyn VoiceActivityDetector, audio: &[i16], sample_rate: u32) {
50//!     let caps = vad.capabilities();
51//!     for frame in audio.chunks_exact(caps.frame_size) {
52//!         let prob = vad.process(frame, sample_rate).unwrap();
53//!         if prob > 0.5 {
54//!             println!("Speech detected!");
55//!         }
56//!     }
57//! }
58//! ```
59//!
60//! # Handling arbitrary chunk sizes
61//!
62//! Real-world audio often arrives in chunks that don't match the backend's
63//! required frame size. Use [`FrameAdapter`] to buffer and split automatically:
64//!
65//! ```no_run
66//! # #[cfg(feature = "webrtc")]
67//! # {
68//! use wavekat_vad::FrameAdapter;
69//! use wavekat_vad::backends::webrtc::{WebRtcVad, WebRtcVadMode};
70//!
71//! let vad = WebRtcVad::new(16000, WebRtcVadMode::Quality).unwrap();
72//! let mut adapter = FrameAdapter::new(Box::new(vad));
73//!
74//! let chunk = vec![0i16; 1000]; // arbitrary size
75//! let results = adapter.process_all(&chunk, 16000).unwrap();
76//! for prob in &results {
77//!     println!("{prob:.3}");
78//! }
79//! # }
80//! ```
81//!
82//! # Audio preprocessing
83//!
84//! Optional preprocessing stages can improve accuracy with noisy input.
85//! See the [`preprocessing`] module for details.
86//!
87//! ```
88//! use wavekat_vad::preprocessing::{Preprocessor, PreprocessorConfig};
89//!
90//! let config = PreprocessorConfig::raw_mic(); // 80Hz HP + normalization
91//! let mut preprocessor = Preprocessor::new(&config, 16000);
92//! let raw: Vec<i16> = vec![0; 512];
93//! let cleaned = preprocessor.process(&raw);
94//! // feed `cleaned` to your VAD
95//! ```
96//!
97//! # Feature flags
98//!
99//! | Feature | Default | Description |
100//! |---------|---------|-------------|
101//! | `webrtc` | Yes | WebRTC VAD backend |
102//! | `silero` | No | Silero VAD backend (ONNX model downloaded at build time) |
103//! | `ten-vad` | No | TEN-VAD backend (ONNX model downloaded at build time) |
104//! | `denoise` | No | RNNoise-based noise suppression in [`preprocessing`] |
105//! | `serde` | No | `Serialize`/`Deserialize` for config types |
106//!
107//! ## ONNX model downloads
108//!
109//! The Silero and TEN-VAD backends download their ONNX models automatically
110//! at build time. For offline or CI builds, set environment variables to
111//! point to local model files:
112//!
113//! ```sh
114//! SILERO_MODEL_PATH=/path/to/silero_vad.onnx cargo build --features silero
115//! TEN_VAD_MODEL_PATH=/path/to/ten-vad.onnx cargo build --features ten-vad
116//! ```
117//!
118//! # Error handling
119//!
120//! All backends return [`Result<f32, VadError>`]. Check a backend's
121//! requirements with [`VoiceActivityDetector::capabilities()`] before processing:
122//!
123//! - [`VadError::InvalidSampleRate`] — unsupported sample rate
124//! - [`VadError::InvalidFrameSize`] — wrong number of samples
125//! - [`VadError::BackendError`] — backend-specific error (e.g. ONNX failure)
126//!
127//! # Examples
128//!
129//! Runnable examples are in the
130//! [`examples/`](https://github.com/wavekat/wavekat-vad/tree/main/crates/wavekat-vad/examples)
131//! directory:
132//!
133//! - **[`detect_speech`](https://github.com/wavekat/wavekat-vad/blob/main/crates/wavekat-vad/examples/detect_speech.rs)** —
134//!   Detect speech in a WAV file using any backend
135//! - **[`ten_vad_file`](https://github.com/wavekat/wavekat-vad/blob/main/crates/wavekat-vad/examples/ten_vad_file.rs)** —
136//!   Process a WAV file with TEN-VAD directly
137//!
138//! ```sh
139//! cargo run --example detect_speech -- audio.wav
140//! cargo run --example detect_speech --features silero -- -b silero audio.wav
141//! cargo run --example ten_vad_file --features ten-vad -- audio.wav
142//! ```
143//!
144//! # TEN-VAD model license
145//!
146//! The TEN-VAD ONNX model is licensed under Apache-2.0 with a non-compete clause
147//! by the TEN-framework / Agora. It restricts deployment that competes with Agora's
148//! offerings. Review the [TEN-VAD license](https://github.com/TEN-framework/ten-vad)
149//! before using in production.
150
151pub mod adapter;
152pub mod backends;
153pub mod error;
154pub mod frame;
155pub mod preprocessing;
156
157pub use adapter::FrameAdapter;
158
159pub use error::VadError;
160
161/// Describes the audio requirements of a VAD backend.
162#[derive(Debug, Clone, PartialEq, Eq)]
163pub struct VadCapabilities {
164    /// Sample rate in Hz.
165    pub sample_rate: u32,
166    /// Required frame size in samples.
167    pub frame_size: usize,
168    /// Frame duration in milliseconds (derived from sample_rate and frame_size).
169    pub frame_duration_ms: u32,
170}
171
172/// Common interface for voice activity detection backends.
173///
174/// Each backend implements this trait, allowing callers to swap
175/// implementations without changing their processing logic.
176pub trait VoiceActivityDetector: Send {
177    /// Returns the audio requirements of this detector.
178    ///
179    /// Use this to determine the expected sample rate and frame size
180    /// before calling [`process`](Self::process).
181    fn capabilities(&self) -> VadCapabilities;
182
183    /// Process an audio frame and return the probability of speech.
184    ///
185    /// Returns a value between `0.0` (silence) and `1.0` (speech).
186    /// Some backends (e.g. WebRTC) return only binary values (`0.0` or `1.0`),
187    /// while others (e.g. Silero) return continuous probabilities.
188    ///
189    /// # Arguments
190    ///
191    /// * `samples` — Audio samples as 16-bit signed integers, mono channel.
192    ///   Must match the `frame_size` from [`capabilities`](Self::capabilities).
193    /// * `sample_rate` — Sample rate in Hz (must match the rate the detector was created with).
194    ///
195    /// # Errors
196    ///
197    /// Returns [`VadError`] if the sample rate or frame size is invalid,
198    /// or if the backend encounters a processing error.
199    fn process(&mut self, samples: &[i16], sample_rate: u32) -> Result<f32, VadError>;
200
201    /// Reset the detector's internal state.
202    ///
203    /// Call this when starting a new audio stream or after a long pause.
204    fn reset(&mut self);
205}