Skip to main content

Crate qwen_asr

Crate qwen_asr 

Source
Expand description

CPU-only Qwen3-ASR speech recognition in pure Rust.

BLAS and SIMD optimizations are selected automatically at compile time based on the target platform — Accelerate + NEON on macOS/aarch64, OpenBLAS + AVX2 on Linux/x86_64, etc. For best performance on x86_64, build with:

RUSTFLAGS="-C target-cpu=native" cargo build --release

Important: Always build in release mode (--release). Debug builds are 10–50x slower and unusable for real-time inference.

§Quick Start

use qwen_asr::context::QwenCtx;
use qwen_asr::transcribe;

let mut ctx = QwenCtx::load("qwen3-asr-0.6b").expect("model not found");
let text = transcribe::transcribe(&mut ctx, "audio.wav").unwrap();
println!("{text}");

§Forced Alignment

With the aligner model variant you can obtain word-level timestamps for a known transcript:

use qwen_asr::context::QwenCtx;
use qwen_asr::align;

let mut ctx = QwenCtx::load("qwen3-aligner-0.6b").expect("aligner model not found");
let samples: Vec<f32> = vec![]; // 16 kHz mono f32 PCM
let results = align::forced_align(&mut ctx, &samples, "Hello world", "English").unwrap();
for r in &results {
    println!("{}: {:.0} – {:.0} ms", r.text, r.start_ms, r.end_ms);
}

§Module Guide

ModulePurpose
contextEngine state — start here with context::QwenCtx::load
transcribeOffline, segmented, and streaming transcription
audioWAV loading, resampling, mel spectrogram
alignForced alignment (word/character timestamps)
configModel configuration and variant detection
tokenizerGPT-2 byte-level BPE tokenizer

The remaining modules (encoder, decoder, kernels, safetensors) are implementation details and not intended for direct use.

Modules§

align
Forced alignment: word- and character-level timestamps.
audio
WAV loading, resampling, and mel spectrogram computation.
config
Model configuration and automatic variant detection.
context
Top-level engine state (QwenCtx) owning all loaded weights and runtime buffers.
decoder
Qwen3 LLM decoder with GQA, KV cache, and generation.
encoder
Audio encoder: Conv2D stem + windowed transformer + projection cascade.
kernels
BLAS/vDSP bindings, thread pool, and SIMD kernel dispatch.
safetensors
Safetensors mmap reader with multi-shard support.
tokenizer
GPT-2 byte-level BPE tokenizer for Qwen.
transcribe
Offline, segmented, and streaming transcription orchestration.

Functions§

optimization_flags
Returns a list of compile-time optimization flags enabled for this build.