1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
//! WaveKat VAD — Unified voice activity detection with multiple backends.
//!
//! This crate provides a common [`VoiceActivityDetector`] trait with
//! implementations for different VAD backends, enabling experimentation
//! and benchmarking across technologies.
//!
//! # Backends
//!
//! | Backend | Feature | Sample Rates | Frame Size | Output |
//! |---------|---------|-------------|------------|--------|
//! | [WebRTC](`backends::webrtc`) | `webrtc` (default) | 8/16/32/48 kHz | 10, 20, or 30ms | Binary (0.0 or 1.0) |
//! | [Silero](`backends::silero`) | `silero` | 8/16 kHz | 32ms | Continuous (0.0–1.0) |
//! | [TEN-VAD](`backends::ten_vad`) | `ten-vad` | 16 kHz only | 16ms | Continuous (0.0–1.0) |
//! | [FireRedVAD](`backends::firered`) | `firered` | 16 kHz only | 10ms | Continuous (0.0–1.0) |
//!
//! # Quick start
//!
//! Add the crate with the backend you need:
//!
//! ```toml
//! [dependencies]
//! wavekat-vad = "0.1" # WebRTC only (default)
//! wavekat-vad = { version = "0.1", features = ["silero"] } # Silero
//! wavekat-vad = { version = "0.1", features = ["ten-vad"] } # TEN-VAD
//! wavekat-vad = { version = "0.1", features = ["firered"] } # FireRedVAD
//! ```
//!
//! Then create a detector and process audio frames:
//!
//! ```no_run
//! # #[cfg(feature = "webrtc")]
//! # {
//! use wavekat_vad::VoiceActivityDetector;
//! use wavekat_vad::backends::webrtc::{WebRtcVad, WebRtcVadMode};
//!
//! let mut vad = WebRtcVad::new(16000, WebRtcVadMode::Quality).unwrap();
//! let samples = vec![0i16; 480]; // 30ms at 16kHz
//! let probability = vad.process(&samples, 16000).unwrap();
//! println!("Speech probability: {probability}");
//! # }
//! ```
//!
//! # Writing backend-generic code
//!
//! All backends implement [`VoiceActivityDetector`], so you can write code
//! that works with any backend:
//!
//! ```no_run
//! use wavekat_vad::VoiceActivityDetector;
//!
//! fn detect_speech(vad: &mut dyn VoiceActivityDetector, audio: &[i16], sample_rate: u32) {
//! let caps = vad.capabilities();
//! for frame in audio.chunks_exact(caps.frame_size) {
//! let prob = vad.process(frame, sample_rate).unwrap();
//! if prob > 0.5 {
//! println!("Speech detected!");
//! }
//! }
//! }
//! ```
//!
//! # Handling arbitrary chunk sizes
//!
//! Real-world audio often arrives in chunks that don't match the backend's
//! required frame size. Use [`FrameAdapter`] to buffer and split automatically:
//!
//! ```no_run
//! # #[cfg(feature = "webrtc")]
//! # {
//! use wavekat_vad::FrameAdapter;
//! use wavekat_vad::backends::webrtc::{WebRtcVad, WebRtcVadMode};
//!
//! let vad = WebRtcVad::new(16000, WebRtcVadMode::Quality).unwrap();
//! let mut adapter = FrameAdapter::new(Box::new(vad));
//!
//! let chunk = vec![0i16; 1000]; // arbitrary size
//! let results = adapter.process_all(&chunk, 16000).unwrap();
//! for prob in &results {
//! println!("{prob:.3}");
//! }
//! # }
//! ```
//!
//! # Audio preprocessing
//!
//! Optional preprocessing stages can improve accuracy with noisy input.
//! See the [`preprocessing`] module for details.
//!
//! ```
//! use wavekat_vad::preprocessing::{Preprocessor, PreprocessorConfig};
//!
//! let config = PreprocessorConfig::raw_mic(); // 80Hz HP + normalization
//! let mut preprocessor = Preprocessor::new(&config, 16000);
//! let raw: Vec<i16> = vec![0; 512];
//! let cleaned = preprocessor.process(&raw);
//! // feed `cleaned` to your VAD
//! ```
//!
//! # Feature flags
//!
//! | Feature | Default | Description |
//! |---------|---------|-------------|
//! | `webrtc` | Yes | WebRTC VAD backend |
//! | `silero` | No | Silero VAD backend (ONNX model downloaded at build time) |
//! | `ten-vad` | No | TEN-VAD backend (ONNX model downloaded at build time) |
//! | `firered` | No | FireRedVAD backend (ONNX model + CMVN downloaded at build time) |
//! | `denoise` | No | RNNoise-based noise suppression in [`preprocessing`] |
//! | `serde` | No | `Serialize`/`Deserialize` for config types |
//!
//! ## ONNX model downloads
//!
//! The Silero, TEN-VAD, and FireRedVAD backends download their ONNX models
//! automatically at build time. The Silero backend is pinned to **v6.2.1** by
//! default.
//!
//! For offline or CI builds, set environment variables to point to local model
//! files:
//!
//! ```sh
//! SILERO_MODEL_PATH=/path/to/silero_vad.onnx cargo build --features silero
//! TEN_VAD_MODEL_PATH=/path/to/ten-vad.onnx cargo build --features ten-vad
//! FIRERED_MODEL_PATH=/path/to/model.onnx FIRERED_CMVN_PATH=/path/to/cmvn.ark cargo build --features firered
//! ```
//!
//! To use a different Silero model version, override the download URL:
//!
//! ```sh
//! SILERO_MODEL_URL=https://github.com/snakers4/silero-vad/raw/v6.0/src/silero_vad/data/silero_vad.onnx cargo build --features silero
//! ```
//!
//! # Error handling
//!
//! All backends return [`Result<f32, VadError>`]. Check a backend's
//! requirements with [`VoiceActivityDetector::capabilities()`] before processing:
//!
//! - [`VadError::InvalidSampleRate`] — unsupported sample rate
//! - [`VadError::InvalidFrameSize`] — wrong number of samples
//! - [`VadError::BackendError`] — backend-specific error (e.g. ONNX failure)
//!
//! # Examples
//!
//! Runnable examples are in the
//! [`examples/`](https://github.com/wavekat/wavekat-vad/tree/main/crates/wavekat-vad/examples)
//! directory:
//!
//! - **[`detect_speech`](https://github.com/wavekat/wavekat-vad/blob/main/crates/wavekat-vad/examples/detect_speech.rs)** —
//! Detect speech in a WAV file using any backend
//! - **[`ten_vad_file`](https://github.com/wavekat/wavekat-vad/blob/main/crates/wavekat-vad/examples/ten_vad_file.rs)** —
//! Process a WAV file with TEN-VAD directly
//!
//! ```sh
//! cargo run --example detect_speech -- audio.wav
//! cargo run --example detect_speech --features silero -- -b silero audio.wav
//! cargo run --example ten_vad_file --features ten-vad -- audio.wav
//! ```
//!
//! # TEN-VAD model license
//!
//! The TEN-VAD ONNX model is licensed under Apache-2.0 with a non-compete clause
//! by the TEN-framework / Agora. It restricts deployment that competes with Agora's
//! offerings. Review the [TEN-VAD license](https://github.com/TEN-framework/ten-vad)
//! before using in production.
pub use FrameAdapter;
pub use VadError;
use Duration;
/// Accumulated processing time breakdown by named pipeline stage.
///
/// Each backend defines its own stages (e.g. `"fbank"`, `"cmvn"`, `"onnx"`),
/// so you can see exactly where time is spent without hardcoding a fixed set
/// of fields. Stages are returned in pipeline order.
///
/// Call [`VoiceActivityDetector::timings()`] to retrieve the current values.
/// Timings accumulate across all calls to [`process()`](VoiceActivityDetector::process)
/// and are **not** reset by [`reset()`](VoiceActivityDetector::reset).
///
/// # Example
///
/// ```ignore
/// let t = vad.timings();
/// for (name, dur) in &t.stages {
/// let avg_us = dur.as_secs_f64() * 1_000_000.0 / t.frames as f64;
/// println!("{name}: {avg_us:.1} µs/frame");
/// }
/// ```
/// Describes the audio requirements of a VAD backend.
/// Common interface for voice activity detection backends.
///
/// Each backend implements this trait, allowing callers to swap
/// implementations without changing their processing logic.