speech-prep 0.1.4

Speech-focused audio preprocessing — VAD, WAV decoding, format detection, noise reduction, chunking
Documentation

speech-prep

CI License

Speech-focused audio preprocessing for Rust.

Features

  • Voice Activity Detection — dual-metric (energy + spectral flux) with adaptive thresholds
  • Format handling — common-format detection plus WAV decoding to 16kHz mono PCM
  • Preprocessing — DC removal, high-pass filter, spectral noise reduction, normalization
  • Chunking — speech-aligned segmentation with configurable duration and overlap
  • Quality assessment — signal quality metrics for speech-oriented pipelines

Quick start

cargo run --example vad_detect
Detected 1 speech segment(s):
  Segment 1: 0.290s — 1.540s  (confidence: 1.00, energy: 0.0362)

Usage

use std::sync::Arc;
use speech_prep::{NoopVadMetricsCollector, VadConfig, VadDetector, VadMetricsCollector};

fn main() -> Result<(), speech_prep::Error> {
    let config = VadConfig::default();
    let metrics: Arc<dyn VadMetricsCollector> = Arc::new(NoopVadMetricsCollector);
    let detector = VadDetector::new(config, metrics)?;

    let audio_samples = vec![0.0; 16_000];
    let segments = detector.detect(&audio_samples)?;
    for seg in &segments {
        println!("{:.3}s — {:.3}s", seg.start_time.as_secs(), seg.end_time.as_secs());
    }

    Ok(())
}

Pipeline

Raw audio bytes
    │
    ▼
Format detection ─→ Decoding ─→ Resampling ─→ Channel mixing
  (format.rs)      (WAV)         (16kHz)       (mono)
    │
    ▼
Preprocessing ─→ VAD ─→ Chunking
  (preprocessing/)  (vad/)  (chunker/)
    │
    ▼
Processed audio chunks with speech metadata

Modules

Module What it does
vad Voice activity detection with energy + spectral flux
converter WAV decoding, resampling, and channel mixing to the crate's standard format
format Audio format detection for WAV, MP3, FLAC, Opus, WebM, and AAC
preprocessing DC removal, high-pass filter, noise reduction, normalization
chunker Speech-aligned segmentation with overlap handling
pipeline End-to-end processing coordinator
buffer Owned sample buffers with processing metadata

Configuration

use speech_prep::VadConfig;

let config = VadConfig {
    base_threshold: 0.02,      // energy threshold for speech detection
    energy_weight: 0.6,        // weight of energy vs spectral flux
    ..VadConfig::default()
};
use speech_prep::ChunkerConfig;

let config = ChunkerConfig::default(); // 500ms target chunks

License

MIT OR Apache-2.0