Expand description
§OxiWhisper
Pure Rust Whisper speech-to-text inference engine with zero C/C++ dependencies.
OxiWhisper loads GGML-format Whisper models and transcribes audio to text, supporting quantized inference (Q4_0, Q5_0, Q8_0), streaming, beam search, word-level timestamps, and SIMD-accelerated kernels (AVX2, NEON, WASM simd128).
§Quick Start
ⓘ
use oxiwhisper::{WhisperModel, TranscribeOptions};
use std::path::Path;
let model = WhisperModel::from_file(Path::new("ggml-tiny.bin"))?;
let audio = oxiwhisper::audio::load_wav(Path::new("audio.wav"))?;
let text = model.transcribe(&audio, &TranscribeOptions::default())?;
println!("{text}");Re-exports§
pub use types::*;
Modules§
- attention
- Multi-head attention primitives shared by encoder and decoder.
- audio
- Audio I/O helpers (WAV loading, PCM resampling). Pure Rust WAV file loader for oxiwhisper.
- beam_
search - Beam search decoder for multi-hypothesis Whisper decoding. Beam search decoder for Whisper.
- decode_
utils - Token-level decoding utilities (argmax, sampling, ngram suppression). Helper functions for Whisper decoder: logit manipulation, sampling, n-gram blocking.
- decoder
- Core Whisper text decoder (forward pass, KV cache, sampling). Core Whisper text decoder: forward pass, greedy/sample decoding, language detection.
- dtw
- Dynamic Time Warping utilities for word-level timestamp alignment. Token alignment algorithms for word-level timestamps.
- encoder
- Whisper audio encoder (CNN + Transformer).
- fft
- FFT utilities backed by OxiFFT (used by the mel spectrogram pipeline). FFT module backed by OxiFFT. Provides Complex type and FFT functions used throughout the codebase.
- hallucination
- Hallucination detection heuristics for Whisper output segments. Hallucination detection via character entropy and compression ratio analysis.
- linear
- Linear (dense) layer kernels with optional quantized weight support.
- mel
- Log-mel spectrogram computation from 16 kHz PCM audio.
- mel_
filters - Pre-computed Whisper mel filterbank coefficients. Programmatic generation of the Whisper mel filter bank.
- model
- GGML model loader and weight storage types.
- quantize
- Quantization types and GGML Q4_0/Q5_0/Q8_0 dequantization kernels. GGML quantization support: Q4_0, Q5_0, and Q8_0 block quantization, dequantization, and SIMD-accelerated dot products.
- stream
- Streaming transcription that accumulates audio in 30-second chunks. Streaming transcription support for incremental audio processing.
- subtitle
- SRT and WebVTT subtitle formatting from timed segments. Subtitle export: SRT and WebVTT formats.
- tensor
- Minimal f32 tensor type used throughout the inference pipeline.
- threading
- Thread count management and parallel iteration helpers. Thread-count management and parallel iteration shim.
- tokenizer
- BPE token-ID to text decoding and segment parsing for Whisper. Pure vocab pass-through decoding for Whisper’s BPE token vocabulary.
- types
- Public types, error enum, and option validation for oxiwhisper. Public types, error handling, and validation for oxiwhisper.
- vad
- Voice activity detection (energy-based silence segmentation). Voice Activity Detection (VAD) module.
Structs§
- Whisper
Model - Main entry point for Whisper speech-to-text inference.