livekit-wakeword 0.1.3

ONNX-based wake word detection for LiveKit
Documentation

LiveKit Wake Word

Rust inference engine for wake word detection, powered by ONNX Runtime. This crate runs classifier models trained with livekit-wakeword to detect wake words (e.g. "hey livekit") from raw PCM audio.

How it works

Audio is processed through a three-stage ML pipeline:

  1. Mel Spectrogram - Converts raw 16 kHz PCM audio into a mel-frequency spectrogram (32 bins)
  2. Embedding - Extracts 96-dimensional speech embeddings from the spectrogram using a sliding window
  3. Classification - Runs one or more classifier models on the embeddings to produce per-wake-word confidence scores

The mel spectrogram and embedding models are embedded in the binary at compile time. Classifier models (.onnx files) are trained using the livekit-wakeword Python toolkit and loaded from disk at runtime, making it easy to add or swap wake words without recompiling.

Usage

Add the dependency to your Cargo.toml:

[dependencies]
livekit-wakeword = "0.1.0"

ONNX backend

By default the crate uses ort-tract (pure-Rust ONNX inference), so no native libraries are needed. On aarch64-pc-windows-msvc, where tract cannot compile due to MSVC-incompatible assembly, the crate automatically falls back to native ONNX Runtime. This is handled at build time — no feature flags or configuration required.

Detect a wake word:

use livekit_wakeword::wakeword::WakeWordModel;

// Load one or more classifier ONNX models, specifying the input sample rate
let mut model = WakeWordModel::new(&["path/to/hey_livekit.onnx"], 48000)?;

// Feed i16 PCM audio — resampling to 16 kHz is handled internally
let predictions = model.predict(&audio)?;

for (name, score) in &predictions {
    println!("{name}: {score:.4}");
}

You can load additional classifiers at runtime:

model.load_model("path/to/another.onnx", "custom_wakeword")?;

Audio requirements

Parameter Value
Sample rate 16,000 / 22,050 / 32,000 / 44,100 / 48,000 / 88,200 / 96,000 / 176,400 / 192,000 / 384,000 Hz
Format i16 PCM
Minimum duration ~2 seconds at the input sample rate

Pass the input sample rate to WakeWordModel::new(). Non-16 kHz audio is resampled internally. Audio shorter than the minimum duration will return a score of 0.0 for all classifiers.

Pre-trained models

The onnx/ directory contains pre-trained models:

File Purpose
melspectrogram.onnx Mel spectrogram extraction (embedded at compile time)
embedding_model.onnx Speech embedding generation (embedded at compile time)
hey_livekit.onnx "Hey LiveKit" wake word classifier (loaded at runtime)

Training custom wake words

To train your own wake word classifiers, see the livekit-wakeword Python toolkit. The exported .onnx classifier models can be loaded directly by this crate.