Expand description
Audio chunking aligned to VAD segments.
This module segments standardized PCM audio into speech-aligned chunks with timing and overlap metadata.
§Architecture
The chunker follows a streaming-first design:
- Accept VAD boundaries (
SpeechChunk) + raw PCM samples - Generate fixed-duration chunks (default 500ms) aligned to speech boundaries
- Attach temporal metadata (
AudioTimestamp) for deterministic testing - Attach quality metrics such as energy and speech ratio
§Performance Contracts
- Latency: <15ms total processing per chunk
- Alignment: ±20ms accuracy to VAD boundaries
- Coverage: Chunks cover 100% of input duration (no gaps)
§Example
use speech_prep::{Chunker, ChunkerConfig, SpeechChunk};
use speech_prep::time::{AudioDuration, AudioTimestamp};
let config = ChunkerConfig::default(); // 500ms chunks
let chunker = Chunker::new(config);
let audio: Vec<f32> = vec![0.0; 16000]; // 1 second @ 16kHz
let vad_segments = vec![SpeechChunk {
start_time: AudioTimestamp::EPOCH,
end_time: AudioTimestamp::EPOCH
.add_duration(AudioDuration::from_secs(1)),
confidence: 0.9,
avg_energy: 0.5,
frame_count: 50,
}];
let chunks = chunker.chunk(&audio, 16000, &vad_segments)?;
assert_eq!(chunks.len(), 2); // Two 500ms chunks from 1s speech
// Overlaps are automatically added between chunks
assert!(chunks[0].overlap_next.is_some()); // First chunk has overlap for next
assert!(chunks[1].overlap_prev.is_some()); // Second chunk has overlap from prevStructs§
- Chunker
- Audio chunker for segmenting streams into processing units.
- Chunker
Config - Configuration for the audio chunker.
- Processed
Chunk - A processed audio chunk with temporal and quality metadata.
Enums§
- Chunk
Boundary - Type of boundary at chunk edges.