Expand description
Streaming speech recognition on top of whisper-rs, using the
LocalAgreement-2 policy from Macháček et al. 2023.
whisper-rs is fully encapsulated: callers do not need to add it
to their own Cargo.toml or import any of its types. Build a
processor with OnlineAsrModel::create_processor or
OnlineAsrProcessor::from_model_path, feed it
16 kHz mono f32 PCM via OnlineAsrProcessor::insert_audio_chunk,
or let AsrPipeline handle downmixing, resampling, and chunking
for a microphone/file/network source.
Accelerated whisper.cpp backends are exposed as Cargo features
(cuda, vulkan, metal, coreml, hipblas, intel-sycl,
openblas, openmp). Use BackendConfig when loading a model
to select a GPU device or force CPU execution.
Structs§
- AsrPipeline
- High-level audio-to-ASR pipeline.
- Audio
Input Config - Source audio format accepted by
AsrPipeline. - Backend
Config - Runtime options for the compiled whisper.cpp backend.
- Linear
Resampler - Streaming linear-interpolation resampler.
- Online
AsrConfig - Tunable parameters for a streaming ASR processor.
- Online
AsrModel - Loaded Whisper model that can create multiple streaming processors without reloading model weights from disk.
- Online
AsrProcessor - Streaming wrapper around
whisper-rsimplementing LocalAgreement-2. - Process
Output - Result of one streaming processing pass.
- VadConfig
- Tunable parameters for the integrated Silero VAD.
Defaultmatches whisper.cpp’s recommended Silero settings (250 ms minimum speech, 100 ms minimum silence, 0.5 probability threshold) plus a 2 s silence-to-reset window suitable for live microphone use. - VadModel
- Loaded Silero VAD model that can be shared by multiple processors.
- Word
- One recognised word with absolute (offset-applied) timestamps in seconds.
Enums§
- Decoding
Strategy - Whisper decoding strategy used for each streaming pass.
- Error
- All failures surfaced by
yamabiko_whisper.
Constants§
- SAMPLE_
RATE - Audio sample rate expected by
OnlineAsrProcessor::insert_audio_chunk.
Traits§
- Audio
Sample - Primitive audio sample types that can be normalized to
f32PCM.
Functions§
- downmix_
interleaved - Downmix interleaved source audio to mono
f32PCM. - install_
log_ hooks - Forward whisper.cpp / GGML / VAD logs to a
log/tracingbackend. Without calling this they are silently dropped, which is usually what you want; call this once at startup if you do want to see them. Thin wrapper aroundwhisper_rs::install_logging_hooksso callers do not need to depend onwhisper-rsdirectly.