speech-prep
Speech-focused audio preprocessing for Rust.
Features
- Voice Activity Detection — dual-metric (energy + spectral flux) with adaptive thresholds
- Format handling — common-format detection plus WAV decoding to 16kHz mono PCM
- Preprocessing — DC removal, high-pass filter, spectral noise reduction, normalization
- Chunking — speech-aligned segmentation with configurable duration and overlap
- Quality assessment — signal quality metrics for speech-oriented pipelines
Quick start
Detected 1 speech segment(s):
Segment 1: 0.290s — 1.540s (confidence: 1.00, energy: 0.0362)
Usage
use Arc;
use ;
Pipeline
Raw audio bytes
│
▼
Format detection ─→ Decoding ─→ Resampling ─→ Channel mixing
(format.rs) (WAV) (16kHz) (mono)
│
▼
Preprocessing ─→ VAD ─→ Chunking
(preprocessing/) (vad/) (chunker/)
│
▼
Processed audio chunks with speech metadata
Modules
| Module | What it does |
|---|---|
vad |
Voice activity detection with energy + spectral flux |
converter |
WAV decoding, resampling, and channel mixing to the crate's standard format |
format |
Audio format detection for WAV, MP3, FLAC, Opus, WebM, and AAC |
preprocessing |
DC removal, high-pass filter, noise reduction, normalization |
chunker |
Speech-aligned segmentation with overlap handling |
pipeline |
End-to-end processing coordinator |
buffer |
Owned sample buffers with processing metadata |
Configuration
use VadConfig;
let config = VadConfig ;
use ChunkerConfig;
let config = default; // 500ms target chunks
License
MIT OR Apache-2.0