# speech-prep
[](https://github.com/dnvt/speech-prep/actions/workflows/ci.yml)
[](LICENSE-MIT)
Speech-focused audio preprocessing for Rust.
## Features
- **Voice Activity Detection** — dual-metric (energy + spectral flux) with adaptive thresholds
- **Format handling** — common-format detection plus WAV decoding to 16kHz mono PCM
- **Preprocessing** — DC removal, high-pass filter, spectral noise reduction, normalization
- **Chunking** — speech-aligned segmentation with configurable duration and overlap
- **Quality assessment** — signal quality metrics for speech-oriented pipelines
## Quick start
```bash
cargo run --example vad_detect
```
```
Detected 1 speech segment(s):
Segment 1: 0.290s — 1.540s (confidence: 1.00, energy: 0.0362)
```
## Usage
```rust
use std::sync::Arc;
use speech_prep::{NoopVadMetricsCollector, VadConfig, VadDetector, VadMetricsCollector};
fn main() -> Result<(), speech_prep::Error> {
let config = VadConfig::default();
let metrics: Arc<dyn VadMetricsCollector> = Arc::new(NoopVadMetricsCollector);
let detector = VadDetector::new(config, metrics)?;
let audio_samples = vec![0.0; 16_000];
let segments = detector.detect(&audio_samples)?;
for seg in &segments {
println!("{:.3}s — {:.3}s", seg.start_time.as_secs(), seg.end_time.as_secs());
}
Ok(())
}
```
## Pipeline
```
Raw audio bytes
│
▼
Format detection ─→ Decoding ─→ Resampling ─→ Channel mixing
(format.rs) (WAV) (16kHz) (mono)
│
▼
Preprocessing ─→ VAD ─→ Chunking
(preprocessing/) (vad/) (chunker/)
│
▼
Processed audio chunks with speech metadata
```
## Modules
| `vad` | Voice activity detection with energy + spectral flux |
| `converter` | WAV decoding, resampling, and channel mixing to the crate's standard format |
| `format` | Audio format detection for WAV, MP3, FLAC, Opus, WebM, and AAC |
| `preprocessing` | DC removal, high-pass filter, noise reduction, normalization |
| `chunker` | Speech-aligned segmentation with overlap handling |
| `pipeline` | End-to-end processing coordinator |
| `buffer` | Owned sample buffers with processing metadata |
## Configuration
```rust
use speech_prep::VadConfig;
let config = VadConfig {
base_threshold: 0.02, // energy threshold for speech detection
energy_weight: 0.6, // weight of energy vs spectral flux
..VadConfig::default()
};
```
```rust
use speech_prep::ChunkerConfig;
let config = ChunkerConfig::default(); // 500ms target chunks
```
## License
MIT OR Apache-2.0