Crate qwen_asr

Expand description

CPU-only Qwen3-ASR speech recognition in pure Rust.

BLAS and SIMD optimizations are selected automatically at compile time based on the target platform — Accelerate + NEON on macOS/aarch64, OpenBLAS + AVX2 on Linux/x86_64, etc. For best performance on x86_64, build with:

RUSTFLAGS="-C target-cpu=native" cargo build --release

Important: Always build in release mode (--release). Debug builds are 10–50x slower and unusable for real-time inference.

§Quick Start

use qwen_asr::context::QwenCtx;
use qwen_asr::transcribe;

let mut ctx = QwenCtx::load("qwen3-asr-0.6b").expect("model not found");
let text = transcribe::transcribe(&mut ctx, "audio.wav").unwrap();
println!("{text}");

§Forced Alignment

With the aligner model variant you can obtain word-level timestamps for a known transcript:

use qwen_asr::context::QwenCtx;
use qwen_asr::align;

let mut ctx = QwenCtx::load("qwen3-aligner-0.6b").expect("aligner model not found");
let samples: Vec<f32> = vec![]; // 16 kHz mono f32 PCM
let results = align::forced_align(&mut ctx, &samples, "Hello world", "English").unwrap();
for r in &results {
    println!("{}: {:.0} – {:.0} ms", r.text, r.start_ms, r.end_ms);
}

§Module Guide

Module	Purpose
`context`	Engine state — start here with `context::QwenCtx::load`
`transcribe`	Offline, segmented, and streaming transcription
`audio`	WAV loading, resampling, mel spectrogram
`align`	Forced alignment (word/character timestamps)
`config`	Model configuration and variant detection
`tokenizer`	GPT-2 byte-level BPE tokenizer

The remaining modules (encoder, decoder, kernels, safetensors) are implementation details and not intended for direct use.

Modules§

align: Forced alignment: word- and character-level timestamps.
audio: WAV loading, resampling, and mel spectrogram computation.
config: Model configuration and automatic variant detection.
context: Top-level engine state split into an immutable, shareable QwenModel (loaded weights) and a per-session QwenCtx (KV cache, scratch, settings).
decoder: Qwen3 LLM decoder with GQA, KV cache, and generation.
encoder: Audio encoder: Conv2D stem + windowed transformer + projection cascade.
int8_sidecar: Memory-mappable INT8 weight sidecar for the decoder (R12-H1).
kernels: BLAS/vDSP bindings, thread pool, and SIMD kernel dispatch.
output: Structured transcription output and JSON serialization.
safetensors: Safetensors mmap reader with multi-shard support.
subtitle: Subtitle cue grouping and SRT/WebVTT formatting.
tokenizer: GPT-2 byte-level BPE tokenizer for Qwen.
transcribe: Offline, segmented, and streaming transcription orchestration.

Functions§

optimization_flags: Returns a list of compile-time optimization flags enabled for this build.

Crate qwen_asr

Crate qwen_asr Copy item path

§Quick Start

§Forced Alignment

§Module Guide

Modules§

Functions§

Crate qwen_asr