pub struct PhasedStream { /* private fields */ }Expand description
A wrapper around multiple phase-shifted OnlineStream states. The use case is latency reduction at the cost of additional compute load.
For example, a transducer with a chunk size of 320ms has worst-case transcription latency of 320ms; it must be fed with 320ms chunks of audio before producing any results. If an utterance lies at the beginning of a chunk, you must wait until the rest arrives before it can be transcribed.
In a PhasedStream with n_phase == 2, the worst-case latency is reduced to 160ms, though compute
utilization is approximately doubled.
This does not mean latency due to compute is doubled; if used correctly, that remains constant. Let Q be the amount of time it takes to transcribe a 320ms chunk: we can feed the transducer with 160ms chunks and expect processing to take Q as well. Instead of paying Q every 320ms we now pay it every 160ms.
Likewise, with n_phase == 3, we could feed 106.7ms chunks and expect to pay Q every 106.7ms. More
generally, the computational cost of transcribing a chunk remains constant while the chunk count in
a given time window scales linearly with the number of phases.
For most zipformer transducers, RTF is favorable (Q is low) and the extra load can be an acceptable trade off for the observed latency improvement.
Created by Model::phased_stream.
Implementations§
Source§impl PhasedStream
impl PhasedStream
Sourcepub fn accept_waveform(&mut self, sample_rate: usize, samples: &[f32])
pub fn accept_waveform(&mut self, sample_rate: usize, samples: &[f32])
Accept ((-1, 1)) normalized) input audio samples and buffer the computed feature frames.