Expand description
Audio preprocessing for Whisper ASR.
Load audio files → decode → resample to 16kHz mono → f32 PCM samples. Supports WAV natively; M4A/MP3/FLAC/OGG via ffmpeg auto-conversion.
Constants§
- CHUNK_
SAMPLES - Whisper processes 30-second chunks. At 16kHz → 480,000 samples.
Functions§
- chunk_
pcm - Split PCM samples into 30-second chunks for Whisper processing.
- load_
audio - Load audio file and return 16kHz mono f32 PCM samples.
- load_
audio_ at_ rate - Load audio file and return mono f32 PCM samples at a configurable sample rate.
- load_
audio_ bytes - Load audio from raw bytes. Tries WAV first; if that fails and bytes look non-WAV, tries ffmpeg.