whisperforge-core 0.4.0

GPU-accelerated Whisper model inference with streaming audio, quantization, and KV-cached decoding
Documentation

whisperforge-core

GPU-accelerated Whisper model inference with streaming audio, quantization, and KV-cached decoding.

Quick Links

Features

  • All Whisper model sizes (tiny.en through large-v2/v3)
  • GPU acceleration via WGPU (Vulkan/DX12/Metal)
  • burn-flex backend: CPU + automatic GPU dispatch
  • INT8 quantization (~4× compression)
  • Streaming audio pipeline with resampling
  • KV-cache O(n) decoder
  • Per-token timestamps via cross-attention

Usage

use whisperforge_core::{Model, WhisperConfig};
use std::path::Path;

let config = WhisperConfig::tiny_en();
let model = Model::load(Path::new("models/tiny_en_converted"))?;
let transcript = model.transcribe(audio_samples, sample_rate)?;
println!("{}", transcript);

See Also

For full documentation, visit the WhisperForge repository.