Crate oxiwhisper

Expand description

§OxiWhisper

Pure Rust Whisper speech-to-text inference engine with zero C/C++ dependencies.

OxiWhisper loads GGML-format Whisper models and transcribes audio to text, supporting quantized inference (Q4_0, Q5_0, Q8_0), streaming, beam search, word-level timestamps, and SIMD-accelerated kernels (AVX2, NEON, WASM simd128).

§Quick Start

use oxiwhisper::{TranscribeOptions, WhisperModel};
use std::path::Path;

let model = WhisperModel::from_file(Path::new("ggml-tiny.bin"))?;
let audio = oxiwhisper::audio::load_wav(Path::new("audio.wav"))?;
let text = model.transcribe(&audio, &TranscribeOptions::default())?;
println!("{text}");

Re-exports§

pub use dtw::WordSegment;
pub use word_timestamps::WordTimedTranscript;
pub use types::*;

Modules§

attention: Multi-head attention primitives shared by encoder and decoder.
audio: Audio I/O helpers (WAV loading, PCM resampling). Pure Rust WAV file loader for oxiwhisper.
beam_search: Beam search decoder for multi-hypothesis Whisper decoding. Beam search decoder for Whisper.
decode_utils: Token-level decoding utilities (argmax, sampling, ngram suppression). Helper functions for Whisper decoder: logit manipulation, sampling, n-gram blocking.
decoder: Core Whisper text decoder (forward pass, KV cache, sampling). Core Whisper text decoder: forward pass, greedy/sample decoding, language detection.
dtw: Dynamic Time Warping utilities for word-level timestamp alignment. Token alignment algorithms for word-level timestamps.
encoder: Whisper audio encoder (CNN + Transformer).
fft: FFT utilities backed by OxiFFT (used by the mel spectrogram pipeline). FFT module backed by OxiFFT. Provides Complex type and FFT functions used throughout the codebase.
hallucination: Hallucination detection heuristics for Whisper output segments. Hallucination detection via character entropy and compression ratio analysis.
linear: Linear (dense) layer kernels with optional quantized weight support.
mel: Log-mel spectrogram computation from 16 kHz PCM audio.
mel_filters: Pre-computed Whisper mel filterbank coefficients. Programmatic generation of the Whisper mel filter bank.
model: GGML model loader and weight storage types.
quantize: Quantization types and GGML Q4_0/Q5_0/Q8_0 dequantization kernels. GGML quantization support: Q4_0, Q5_0, and Q8_0 block quantization, dequantization, and SIMD-accelerated dot products.
stream: Streaming transcription that accumulates audio in 30-second chunks. Streaming transcription support for incremental audio processing.
subtitle: SRT and WebVTT subtitle formatting from timed segments. Subtitle export: SRT and WebVTT formats.
tensor: Minimal f32 tensor type used throughout the inference pipeline.
threading: Thread count management and parallel iteration helpers. Thread-count management and parallel iteration shim.
tokenizer: BPE token-ID to text decoding and segment parsing for Whisper. Pure vocab pass-through decoding for Whisper’s BPE token vocabulary.
types: Public types, error enum, and option validation for oxiwhisper. Public types, error handling, and validation for oxiwhisper.
vad: Voice activity detection (energy-based silence segmentation). Voice Activity Detection (VAD) module.
word_timestamps: Word-level timestamp alignment via cross-attention DTW. Word-level timestamp alignment: wires dtw into the decoder pipeline.

Structs§

WhisperModel: Main entry point for Whisper speech-to-text inference.

Crate oxiwhisper

Crate oxiwhisper Copy item path

§OxiWhisper

§Quick Start

Re-exports§

Modules§

Structs§

Crate oxiwhisper