Expand description
§OxiWhisper
Pure Rust Whisper speech-to-text inference engine with zero C/C++ dependencies.
OxiWhisper loads GGML-format Whisper models and transcribes audio to text, supporting quantized inference (Q4_0, Q5_0, Q8_0), streaming, beam search, word-level timestamps, and SIMD-accelerated kernels (AVX2, NEON, WASM simd128).
§Quick Start
ⓘ
use oxiwhisper::{WhisperModel, TranscribeOptions};
use std::path::Path;
let model = WhisperModel::from_file(Path::new("ggml-tiny.bin"))?;
let audio = oxiwhisper::audio::load_wav(Path::new("audio.wav"))?;
let text = model.transcribe(&audio, &TranscribeOptions::default())?;
println!("{text}");Re-exports§
pub use types::*;
Modules§
- attention
- audio
- Pure Rust WAV file loader for oxiwhisper.
- beam_
search - Beam search decoder for Whisper.
- decode_
utils - Helper functions for Whisper decoder: logit manipulation, sampling, n-gram blocking.
- decoder
- Core Whisper text decoder: forward pass, greedy/sample decoding, language detection.
- dtw
- Dynamic Time Warping (DTW) for word-level timestamp alignment.
- encoder
- fft
- FFT module backed by OxiFFT. Provides Complex type and FFT functions used throughout the codebase.
- hallucination
- Hallucination detection via character entropy and compression ratio analysis.
- linear
- mel
- mel_
filters - Programmatic generation of the Whisper mel filter bank.
- model
- quantize
- GGML quantization support: Q4_0, Q5_0, and Q8_0 block quantization and dequantization.
- stream
- Streaming transcription support for incremental audio processing.
- subtitle
- Subtitle export: SRT and WebVTT formats.
- tensor
- tokenizer
- types
- Public types, error handling, and validation for oxiwhisper.
- vad
- Voice Activity Detection (VAD) module.