Skip to main content

Module audio_processor

Module audio_processor 

Source
Expand description

Audio preprocessing for Whisper ASR.

Load audio files → decode → resample to 16kHz mono → f32 PCM samples. Supports WAV natively; M4A/MP3/FLAC/OGG via ffmpeg auto-conversion.

Constants§

CHUNK_SAMPLES
Whisper processes 30-second chunks. At 16kHz → 480,000 samples.

Functions§

chunk_pcm
Split PCM samples into 30-second chunks for Whisper processing.
load_audio
Load audio file and return 16kHz mono f32 PCM samples.
load_audio_at_rate
Load audio file and return mono f32 PCM samples at a configurable sample rate.
load_audio_bytes
Load audio from raw bytes. Tries WAV first; if that fails and bytes look non-WAV, tries ffmpeg.