speech-prep

Speech-focused audio preprocessing for Rust.

Features

Voice Activity Detection — dual-metric (energy + spectral flux) with adaptive thresholds
Format handling — common-format detection plus WAV decoding to 16kHz mono PCM
Preprocessing — DC removal, high-pass filter, spectral noise reduction, normalization
Chunking — speech-aligned segmentation with configurable duration and overlap
Quality assessment — signal quality metrics for speech-oriented pipelines

Quick start

cargo run --example vad_detect

Detected 1 speech segment(s):
  Segment 1: 0.290s — 1.540s  (confidence: 1.00, energy: 0.0362)

Usage

use std::sync::Arc;
use speech_prep::{NoopVadMetricsCollector, VadConfig, VadDetector, VadMetricsCollector};

fn main() -> Result<(), speech_prep::Error> {
    let config = VadConfig::default();
    let metrics: Arc<dyn VadMetricsCollector> = Arc::new(NoopVadMetricsCollector);
    let detector = VadDetector::new(config, metrics)?;

    let audio_samples = vec![0.0; 16_000];
    let segments = detector.detect(&audio_samples)?;
    for seg in &segments {
        println!("{:.3}s — {:.3}s", seg.start_time.as_secs(), seg.end_time.as_secs());
    }

    Ok(())
}

Pipeline

Raw audio bytes
    │
    ▼
Format detection ─→ Decoding ─→ Resampling ─→ Channel mixing
  (format.rs)      (WAV)         (16kHz)       (mono)
    │
    ▼
Preprocessing ─→ VAD ─→ Chunking
  (preprocessing/)  (vad/)  (chunker/)
    │
    ▼
Processed audio chunks with speech metadata

Modules

Module	What it does
`vad`	Voice activity detection with energy + spectral flux
`converter`	WAV decoding, resampling, and channel mixing to the crate's standard format
`format`	Audio format detection for WAV, MP3, FLAC, Opus, WebM, and AAC
`preprocessing`	DC removal, high-pass filter, noise reduction, normalization
`chunker`	Speech-aligned segmentation with overlap handling
`pipeline`	End-to-end processing coordinator
`buffer`	Owned sample buffers with processing metadata

Configuration

use speech_prep::VadConfig;

let config = VadConfig {
    base_threshold: 0.02,      // energy threshold for speech detection
    energy_weight: 0.6,        // weight of energy vs spectral flux
    ..VadConfig::default()
};

use speech_prep::ChunkerConfig;

let config = ChunkerConfig::default(); // 500ms target chunks

License

MIT OR Apache-2.0

speech-prep 0.1.4

speech-prep

Features

Quick start

Usage

Pipeline

Modules

Configuration

License