pyannote-rs 0.1.3

Speaker diarization using pyannote in Rust
Documentation

pyannote-rs

Install

cargo add pyannote-rs

Usage

See Building

Examples

See examples

How it works

pyannote-rs uses 2 models to achieve speaker diarization. The first one is segmentation-3.0 for segmentation (knowing when speech occur)

The second model is wespeaker-voxceleb-resnet34-LM which uses to identify who is speaking.

All the inference happens in onnxruntime

The sementation model expects input of at most 10s audio. So we feed it with sliding window of 10s (iterate 10s and feed).

The embedding model expects input of filter banks (extracted features from the audio), so we use knf-rs to extract them.

For speaker comparision (Eg. is Alis spoke again?) we use cosine similarity.

Credits

Big thanks to pyannote-onnx and kaldi-native-fbank