Crate yamabiko_whisper

Expand description

Streaming speech recognition on top of whisper-rs, using the LocalAgreement-2 policy from Macháček et al. 2023.

whisper-rs is fully encapsulated: callers do not need to add it to their own Cargo.toml or import any of its types. Build a processor with OnlineAsrModel::create_processor or OnlineAsrProcessor::from_model_path, feed it 16 kHz mono f32 PCM via OnlineAsrProcessor::insert_audio_chunk, or let AsrPipeline handle downmixing, resampling, and chunking for a microphone/file/network source.

Accelerated whisper.cpp backends are exposed as Cargo features (cuda, vulkan, metal, coreml, hipblas, intel-sycl, openblas, openmp). Use BackendConfig when loading a model to select a GPU device or force CPU execution.

Structs§

AsrPipeline: High-level audio-to-ASR pipeline.
AudioInputConfig: Source audio format accepted by AsrPipeline.
BackendConfig: Runtime options for the compiled whisper.cpp backend.
LinearResampler: Streaming linear-interpolation resampler.
OnlineAsrConfig: Tunable parameters for a streaming ASR processor.
OnlineAsrModel: Loaded Whisper model that can create multiple streaming processors without reloading model weights from disk.
OnlineAsrProcessor: Streaming wrapper around whisper-rs implementing LocalAgreement-2.
ProcessOutput: Result of one streaming processing pass.
VadConfig: Tunable parameters for the integrated Silero VAD. Default matches whisper.cpp’s recommended Silero settings (250 ms minimum speech, 100 ms minimum silence, 0.5 probability threshold) plus a 2 s silence-to-reset window suitable for live microphone use.
VadModel: Loaded Silero VAD model that can be shared by multiple processors.
Word: One recognised word with absolute (offset-applied) timestamps in seconds.

Enums§

DecodingStrategy: Whisper decoding strategy used for each streaming pass.
Error: All failures surfaced by yamabiko_whisper.

Constants§

SAMPLE_RATE: Audio sample rate expected by OnlineAsrProcessor::insert_audio_chunk.

Traits§

AudioSample: Primitive audio sample types that can be normalized to f32 PCM.

Functions§

downmix_interleaved: Downmix interleaved source audio to mono f32 PCM.
install_log_hooks: Forward whisper.cpp / GGML / VAD logs to a log / tracing backend. Without calling this they are silently dropped, which is usually what you want; call this once at startup if you do want to see them. Thin wrapper around whisper_rs::install_logging_hooks so callers do not need to depend on whisper-rs directly.

Crate yamabiko_whisper

Crate yamabiko_whisper Copy item path

Structs§

Enums§

Constants§

Traits§

Functions§

Crate yamabiko_whisper