Skip to main content

Crate yamabiko_whisper

Crate yamabiko_whisper 

Source
Expand description

Streaming speech recognition on top of whisper-rs, using the LocalAgreement-2 policy from Macháček et al. 2023.

whisper-rs is fully encapsulated: callers do not need to add it to their own Cargo.toml or import any of its types. Build a processor with OnlineAsrModel::create_processor or OnlineAsrProcessor::from_model_path, feed it 16 kHz mono f32 PCM via OnlineAsrProcessor::insert_audio_chunk, or let AsrPipeline handle downmixing, resampling, and chunking for a microphone/file/network source.

Accelerated whisper.cpp backends are exposed as Cargo features (cuda, vulkan, metal, coreml, hipblas, intel-sycl, openblas, openmp). Use BackendConfig when loading a model to select a GPU device or force CPU execution.

Structs§

AsrPipeline
High-level audio-to-ASR pipeline.
AudioInputConfig
Source audio format accepted by AsrPipeline.
BackendConfig
Runtime options for the compiled whisper.cpp backend.
LinearResampler
Streaming linear-interpolation resampler.
OnlineAsrConfig
Tunable parameters for a streaming ASR processor.
OnlineAsrModel
Loaded Whisper model that can create multiple streaming processors without reloading model weights from disk.
OnlineAsrProcessor
Streaming wrapper around whisper-rs implementing LocalAgreement-2.
ProcessOutput
Result of one streaming processing pass.
VadConfig
Tunable parameters for the integrated Silero VAD. Default matches whisper.cpp’s recommended Silero settings (250 ms minimum speech, 100 ms minimum silence, 0.5 probability threshold) plus a 2 s silence-to-reset window suitable for live microphone use.
VadModel
Loaded Silero VAD model that can be shared by multiple processors.
Word
One recognised word with absolute (offset-applied) timestamps in seconds.

Enums§

DecodingStrategy
Whisper decoding strategy used for each streaming pass.
Error
All failures surfaced by yamabiko_whisper.

Constants§

SAMPLE_RATE
Audio sample rate expected by OnlineAsrProcessor::insert_audio_chunk.

Traits§

AudioSample
Primitive audio sample types that can be normalized to f32 PCM.

Functions§

downmix_interleaved
Downmix interleaved source audio to mono f32 PCM.
install_log_hooks
Forward whisper.cpp / GGML / VAD logs to a log / tracing backend. Without calling this they are silently dropped, which is usually what you want; call this once at startup if you do want to see them. Thin wrapper around whisper_rs::install_logging_hooks so callers do not need to depend on whisper-rs directly.