Skip to main content

Crate oxiwhisper

Crate oxiwhisper 

Source
Expand description

§OxiWhisper

Pure Rust Whisper speech-to-text inference engine with zero C/C++ dependencies.

OxiWhisper loads GGML-format Whisper models and transcribes audio to text, supporting quantized inference (Q4_0, Q5_0, Q8_0), streaming, beam search, word-level timestamps, and SIMD-accelerated kernels (AVX2, NEON, WASM simd128).

§Quick Start

use oxiwhisper::{WhisperModel, TranscribeOptions};
use std::path::Path;

let model = WhisperModel::from_file(Path::new("ggml-tiny.bin"))?;
let audio = oxiwhisper::audio::load_wav(Path::new("audio.wav"))?;
let text = model.transcribe(&audio, &TranscribeOptions::default())?;
println!("{text}");

Re-exports§

pub use types::*;

Modules§

attention
Multi-head attention primitives shared by encoder and decoder.
audio
Audio I/O helpers (WAV loading, PCM resampling). Pure Rust WAV file loader for oxiwhisper.
beam_search
Beam search decoder for multi-hypothesis Whisper decoding. Beam search decoder for Whisper.
decode_utils
Token-level decoding utilities (argmax, sampling, ngram suppression). Helper functions for Whisper decoder: logit manipulation, sampling, n-gram blocking.
decoder
Core Whisper text decoder (forward pass, KV cache, sampling). Core Whisper text decoder: forward pass, greedy/sample decoding, language detection.
dtw
Dynamic Time Warping utilities for word-level timestamp alignment. Token alignment algorithms for word-level timestamps.
encoder
Whisper audio encoder (CNN + Transformer).
fft
FFT utilities backed by OxiFFT (used by the mel spectrogram pipeline). FFT module backed by OxiFFT. Provides Complex type and FFT functions used throughout the codebase.
hallucination
Hallucination detection heuristics for Whisper output segments. Hallucination detection via character entropy and compression ratio analysis.
linear
Linear (dense) layer kernels with optional quantized weight support.
mel
Log-mel spectrogram computation from 16 kHz PCM audio.
mel_filters
Pre-computed Whisper mel filterbank coefficients. Programmatic generation of the Whisper mel filter bank.
model
GGML model loader and weight storage types.
quantize
Quantization types and GGML Q4_0/Q5_0/Q8_0 dequantization kernels. GGML quantization support: Q4_0, Q5_0, and Q8_0 block quantization, dequantization, and SIMD-accelerated dot products.
stream
Streaming transcription that accumulates audio in 30-second chunks. Streaming transcription support for incremental audio processing.
subtitle
SRT and WebVTT subtitle formatting from timed segments. Subtitle export: SRT and WebVTT formats.
tensor
Minimal f32 tensor type used throughout the inference pipeline.
threading
Thread count management and parallel iteration helpers. Thread-count management and parallel iteration shim.
tokenizer
BPE token-ID to text decoding and segment parsing for Whisper. Pure vocab pass-through decoding for Whisper’s BPE token vocabulary.
types
Public types, error enum, and option validation for oxiwhisper. Public types, error handling, and validation for oxiwhisper.
vad
Voice activity detection (energy-based silence segmentation). Voice Activity Detection (VAD) module.

Structs§

WhisperModel
Main entry point for Whisper speech-to-text inference.