speech-prep
Speech-focused audio preprocessing for Rust.
Features
- Voice Activity Detection — dual-metric (energy + spectral flux) with adaptive thresholds
- Multi-format decoding — WAV, MP3, FLAC, OGG, M4A, Opus → 16kHz mono PCM
- Preprocessing — DC removal, high-pass filter, spectral noise reduction, normalization
- Chunking — speech-aligned segmentation with configurable duration and overlap
- Quality assessment — signal metrics for downstream processing gates
Quick start
Detected 1 speech segment(s):
Segment 1: 0.290s — 1.540s (confidence: 1.00, energy: 0.0362)
Usage
use Arc;
use ;
let config = default;
let metrics: = new;
let detector = new?;
let segments = detector.detect?;
for seg in &segments
Pipeline
Raw audio bytes
│
▼
Format detection ─→ Decoding ─→ Resampling ─→ Channel mixing
(format.rs) (decoder/) (16kHz) (mono)
│
▼
Preprocessing ─→ VAD ─→ Chunking
(preprocessing/) (vad/) (chunker/)
│
▼
Processed audio chunks with speech metadata
Modules
| Module | What it does |
|---|---|
vad |
Voice activity detection with energy + spectral flux |
decoder |
WAV/PCM decoding, sample rate conversion, channel mixing |
converter |
Unified format conversion pipeline |
format |
Audio format detection (6 formats) |
preprocessing |
DC removal, high-pass filter, noise reduction, normalization |
chunker |
Speech-aligned segmentation with overlap handling |
pipeline |
End-to-end processing coordinator |
buffer |
Audio buffer types with metadata |
Configuration
use VadConfig;
let config = VadConfig ;
use ChunkerConfig;
let config = default; // 500ms target chunks
License
MIT OR Apache-2.0